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MULTI-MICROPHONE ADAPTIVE NOISE REDUCTION TECHNIQUES 

-FOR SPEECH ENHANCEMENT 

I. background « c 

In speech communication appHcations, such as teleconfiaenciiig, hands-fiee telephony and hearing aids, 
the presence of background noise and/or revwbetation may significantly reduce the mtBDigtbiHty of fte de- 
sired speech signal. This stems fiom the large distance between the speaker and the microphone(s). Hence 
the nse of a noise leduction algorithm is necessary. Multi-microphone systems e^loit spatial mfoima- 
tion m addition to teiiq>oial and spectral information of the desired signal and noise signal and are tiius 
preferred to single microphone procedures (such as spectral subtiaction). Because of aeslhelical reasons 
mulh-microphone techniques for e.g., hearing aid appUcations go together ^ flie use of smaU-sized ar- 
rays. Considerable noise reduction can be achieved witii such arrays, but at tiie expense of an mcreased 
sensitivity to eizors in the assumed signal model such as microphone mismatdi. reverberation, .. [1 2] In 
heanng aids, inicrpphones are rarely matehedm gam and phase. In [3], e.g.. gam and phase diferences 
between microphone characteristics of up to 6 dB and 10°, respectively, have been reported. 

A widely studied multi-channel adaptive noise reduction algoiifinn is tiie GenenOzed Stdelobe Canr- 
cdler (GSC) [2].[11], depicted m Figure 1. The GSC consists of a fixed, spatial preprocessor, which 

mchidesafixedbeamfoimerandabloddng matrix, and an ad^tive stage based on an Adaptive Noise 
celler (ANC) 112]. The ANC mimmizes tiie ou^ut noise power while tiie blocking matrix should avoid 
speech leakage into tiie noise references. The standard GSC assumes tiie desired speaker location, tiie mi- 
crophone characteristics and positions to be known, and reflections of tiie speech signal to be absent If fliese 
assumptions are fidfiUed. it provides ah undistorted enhanced speech signal witii mmimum residual noise. 
However, m teaKty tiiese assumptions are oflen violated, resulting m so-called speech leakage and hence 
speech distortion. To limit speech distortion, tiie ANC is adapted during periods of noise only [7 10 13] 
Wheaused in combination witii smaU.si2edanray8.e.g.,mhearing aid applications, anad^^ 
consttamt [9. 10. 14, 15] is required to guarantee performance m tiie presence of small errors m tiie assumed 
signal model, such as microphone mismatch [16, 17]. A widely applied metiiod consists of imposmg a 
Quadratic InequaUly Consliaint to tiie ANC (QIC^SC) [10. 14, 15. 18. 19]. For IMS updating, the Scaled 
Projection Algoriflim (SPA) [14] is a shnple and effective tednique tiiat hnposes fliis constraint However, 
the QIC-GSC goes at the expose of less noise redaction [17]. 

In [20], ^Multi-channel FF7e/ier.n/term^(MWF) technique has been proposed fliat provides a Minimum 

Mean Square Error (MMSE) estimate offlxe desired signal portion in one oftiieiecehedmicrophone signals 
[21].[24]. In contiast to tiie ANC of tiie GSC, tiie MWF is able to take speech distortion mto account m 
Its optmuzation criterion. The MMSE optimiaation criterion of flie MWF can also be generalized to aUow 
for a toade-ofif between speech distortion and noise reduction. We wUl telfer to flus generalization as Speech 
Distortion Weighted MWF (SDW-MWF). The MWF technique is uniquely based on estimates of tiie second 
order statistics of tiie recorded speech signal and tiie noise signal. A robust speech detection is tiius (agam) 
•needed, hi contrast to tiie GSC, tiie MWF does not make any apriori assumptions about tiie signal model so 
fliat no oraless severe robustness constramt is needed to guarantee performance when used m combination 
witii small-sized arrays [16, 17]. Especially in conqilicated noise scenarios such as multiple noise sources 



or diffuse noise, the MWF outperfonns the QSC. even when the GSC is siq>plemented with a robustness 
constcaint [17]. 

In [20, 21], the implementation of the MWF is based on a Gen«siaKzed Singular Value Decompositicm 
(GSVD) of an input data matrix and a noise data matrix. A cheaper aUemative based on a QR Decompo- 
sition (QRD) has been proposed in [22]. A subband implementation [23] results in improved inteUigibility 
at a significantly lower cost compared to liie fullband approach. However, in contrast to the GSC and the 
QIC-GSC [14], no effldent, che^ stochastic gradient based inq>lementation of die (SDW-)MWF, which 
avoids the use of apeusive matrix coaptations, is available yet In [25], an LMS based algorithm' for the 
MWF has been developed. ITie algorithm needs recordings.of calihratian signals. Since toom acoustics, 
microphone characteristics and the location of the desired speaker change over time. fiequentre^TaatioiI 
is required, making this approach cumbersome and expensive. In [26], an LMS based SDW-MWF has 
been proposed that avdds the need for calibration signals. The algorithm however relies on some indepen- 
dence assumptions that are not necessarily satisfied, resulting in degraded performance w.r.t matrix-based 
impIementaticHiis. 

n. Summary 

hi the present invention, we establish a generalized multi-channel noise reduction scheme, referred to 
as Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener Filter (SP-SDW-MWF), that 
encompasses the GSC and the MWF as extreme cases. In addition, the scheme allows for m-bet^veen 
sohilions sudi as flie Speedi Distortion Regularized GSC (SDR-GSC). The generalized scheme, depicted 
m Figure 3. consists of a fixed, spatial pre-processor and an ad^tive stage tiiat is based on an SDW-MWF, 
hence die name SpatMly Pre-processed Speedi Distortion Wei^Oed Multi-channel Wiener filter (SP-SDW- 
MWF). 

The SP-SDW-MWF adds robustness against signal model errors to the GSC by takmg speech distortion 
explicitly into account in die design criterion of the adaptive stage. The SP-SDW-MWF is an alternative 
technique to the widely studied QIC-GSC to decrease the sensitivity of the GSC to signal model errors 
^h as microphone mismatch, reverberation, ... A parameter n is incorporated in the SP-SDW-MWF that 
aUows for a tradeK)fif between speech distortion and noise reduction. Focussing all attention towards speech 
distortion (Le., setting ii = 0) results in the ou^ut of the fixed beaiiiformeY. hi noke sceriaios" withvery" 
low Signal-to-Noise Ratio (SNR). e.g.. -10 dB, a fixed beamformer may be preferred. Adaptivity can then 
be easily reduced or excluded in the SP-SDW-MWF by decreasing (he parameter /i to 0. Compared to the 
vridely studied QIC-GSC, die SP-SDW-MWF achieves a better noise reduction performance for a ^ven 
m a ximum allowable speech distortion level. 

In [22. 27] recursive hnplementations of tiie (SDW-)MWF have been proposed based on a GSVD or QR 
decomposition. A subband unplementation [28] results m unproved mtelligibiUty at a significantly lower 
cost compared to the fullband approach. These techniques can be extended to implement the SP-SDW- 
MWF [29]. However, in contrast to die GSC and die QIC-GSC [14], no cheap stochastic gradient based 
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. implementation of the SP-SDW-MWF is available. In tiie present invention, we propose time-domain and 
fiequency-domain stochastic gradient implementations of the SP-SDW-MWF that preserve flie benefit of 
flie matnx-based SP-SDW-MWF over QIC-GSC. 

Below, the different embodiments of the present invradion are described. 

K first embodiment proposes a Speedi Distortion Regularized GSC (SDR-GSC). A new design criterion 
is developed for the adaptive stage of the GSC: the ANC design criterion is supplemented wilh a regular- 
ization term that limits speech distortion dae to signal model errors. In the SDR-GSC, a parameter n is 
incorporatosd that aUows for a trade-off between speech distortion and noise reduction. Focussing all atten- 
tion to noise reduction, results in the standard GSC, while, on the other hand, focussing all attention towards 
speech distortion results in the output of tiie fixed beamfomier. In noise scenarios with low SNR, adaptivity 
in the SDR-GSC can be easUy reduced or excluded by increasing attention tovrards speech distortion. i.e., by 
decre^ the parameter a* to 0. The SDR-GSC is an alternative technique to the QIC-GSC to decrease'tiie 
sensitivily of tiife GSC to signal model enors sudi as microphone mismatch, reverberation, .... In contrast to 
the QIC-GSC, the SDR-GSC shifts emphasis towards speech distortion when the amount of speech leakage 
grows. In tfie absence of signal model errors, the perfomiance of flie GSC is preserved. As a result, a better 
noise reduction performance is obtained for smaU model errors, while guaranteeing robustness against large 
model errors. 

In a second embodiment, we fiathra- inqHove tiie noise reduction performance of the SDR-GSC by 
adding an extra ad^tive filtering operation wq oh tiie speech reference signal We refisr to tiiis general- 
ized scheme as SpatiaUy Pre-processed Speech Distortion Weighted Multi-channel Wiener FUter (SP-SDW- 
MWF). The SP-SDW-MWF is depicted in Figure 3 and encompasses tiie MWF [20] as a special case. Again, 
a parameter n is incorporated in tiie design criterion to aUow for a trade-off between speech distortion and 
noise reduction. Focussing aU attention to speech distortion, results in tiie ou^ of tiie fixed beamfiirmer. 
Also here, adaptivity can be easily reduced or excluded by decreasmg /* to 0. It is shown fliat -in tiie ab- 
sence of speech leakage and for infinitely long filter lengflis- tiie SP-SDW-MWF corresponds to a cascade 
of a SDR-GSC witii a SDW single channel Wiener postfilter (SDW-SWF) [30] and tiius ou^erforms tiie 
SDR-GSC. In the presence of speech leakage, tiie SP-SDW^MWF wifli wq tries to preserve its performance: 
compared to a SDR-GSC (witii SDW-SWF postfilter), tiie SP-SDW-MWF tiien contains extra filtering op- 
erations tiiat compensate for tiie perfomiance degradation of flie SDR-GSC (wifli SDW-SWF) due to speech 
leakage (see also Figure 4). In contrast to fiie SDR-GSC (and tiius also tire GSC), performance does not 
degrade due to microphone mismatch. In [22, 27] recursive unplementations of tiie (SDW-)MWF have been 
proposed based on a GSVD or QR decomposition. A subband i]iq>lementation [28] results in improved 
mtelligibiUty at a significantiy lower cost compared to tiie fiillband qiproach. These techniques can be 
extended to implement tiie SDR-GSC and, more generally, tiie SP-SDW-MWF. 

In a third embodiment, we propose cheap time-domain and Jhequency-domain stochastic gradient im- 
plementations of tiie SDR-GSC and SP-SDW-MWF. Starting' from tiie design criterion of tiie SDR-GSC. 
or more generally, tiie SP-SDW-MWF. we derive a time-domain stochastic gradient algoritimi. In addition. 




we modify the LMS based algorithm [26] so that it applies to the SP-SDW-MWF. To increase convergence 
andieduce complexity, a ftequency-domain implementation has been proposed. Both, the stochastic gra- 
dient and LMS based algorithm sufifer fiom a large excess error when applied in highly time-varying noise 
scenarios. We show that the excess error in the stochastic gradient algorithm is reduced by applying a low 
pass filter to the part of the gradient estimate that lunits speech distortion. .Tte low pass filtering avoids 
a highly time-vaiying distortion o^ the desired speech component while not degrading the tracking perfor- 
mance needed in time-varying noise scenarios. The stochastic gradient SP-SDW-MWF ou^erforms the 
LMSbasedalgoriflnn,T(Mecomplexityis not increased. Expaiment^iesulte show that the low pass filter- 
ing significantly unproves flie perfi)rmance of the stodiastic ^ent algorithm and does not compromise &e 
tracking of changes in the noise scenario. In addition, experiments demonstrate that the proposed stochastic 
gradient algorithm preserves the benefit of the SP-SDW-MWF over QIC-GSC. The limited computational 
cost and the better noise reduction perfomiance of the proposed algorithm make it a good alternative to the 
SPA [14] fiir imiilanentation in hearing aids. 



Brief Description of tiie Drawings 



A number of embodiments of the present invention, together with some aspects of 
flieprior art wiUnowbe described with reference to the drawings, in which: 

Fig. 1 depicts the concept of a Generalized Sidelobe Canceller; 

Hg. 2 depicts an equivalent ^roach of multi-channel Wiener filtering; 
• Fig. 3 depicts a Spatially Pre-processed SDW MWF; 

Hg. 4 depicts the decomposition of SP-SDW-MWF with Wo in a multi-channel filter 
Wd and single-channel postfilter ei - wo; 

Fig. 5 shows the influence of 1/^ on the perfomiance of the SDR GSC for differ«it 
gain mismatches at the second microphone; 

Fig. 6 shows the influence of l/^ oil the performance of the SP SDW MWF with Wq 
for different gain mismatches Y2 at the second microphone; 

Fig. 7 shows the ASNRfetdHg and SDinteffig for QIC-GSC as a fimction of f for 
different gain mismatches Y2 at the second microphone; 

Fig. 8 depicts the conq)lexity of TD and FD Stochastic Gradient (SG) algorithm with 
LP filtermg as a fimction of filter length L per channel; M = 3 (for comparison, the 
complexity of the standard NLMS ANC and SPA are d^icted too); 

Fig. 9 dq)icts the perfonnance of different FD Stochastic Gradient (FD-SG) 
algorithms; (a) Stationary speechlike noise at 90°; (b) Multi-talker babble noise at 90«; 

Fig. 10 depicts the influence of LP filter on perfonnance of FD stochastic gradient 
SP-SDW-MWF « 0.5) without wo and with wo- Babble noise at 90°; 

Fig. 11 depicts the convergence behavior of FD-SG for X = 0 and X= 0.9998. The 
noise source position suddenly changes from 90° to 180° and vice versa; 

Fig. 12 depicts the performance of FD stochastic gradient implementation of SP- 
SDW-MWF with LP (X= 0.9998) in amultiple noise source scenario; and 

Fig. 13 depicts the performance of FD SPA in a multiple noise source scaiario. 

Detailed Description 

Before the invention is described in detail, the prior art GSC [4] and the QIC-GSC 
[14. 19] win be reviewed under section 1. Under section 2, flie Multi-channel Wiener 
Filter (MWF) technique wiU be discussed [20]. 



1 Generalized Sidelobe Canceller (GS.C) 



1.1 Concept 

Figure 1 describes tiie concept of Hbe Goieialized Sidelobe CanceUer (GSQ [4], which consists of a fixed, 
spatial pie-processor, Le^ a fixed beamfcnner A(«) and a blocking matrix B(z), and an ANC Given M 
microphonB signals 

Ui[k] = uf [fcl + u?[fc], i = 1, M (1) 

wifli u?[fc] the deshred speech contribution and <(fc] the noise contribution, tiie fixed beamfimner A(«) 
(e.g., delay-andrsum) creates a so-called speech reference 

l«)lfe]=l/Slfc] (2) 

by steering a beam towards the direction of the desired signal with a speech contribution y^[k] and a noise 
contribution yS[k]. In the sequel an eaidfire aiiay is assumed and the desired qjeaker is assumed to be in 
fixsnt at The blodring matrix B(«) creates Af - 1 so-called noise references 

l«{*]=t/|lfc] + J/?[fcl,»=l,...,M-l (3) 

by steering zeroes towards tiie firont so that the noise contributions j4»[fc] are dominant corrqwced to ti» 
speech leakage contributions y?[k]. In the sequel, the superscripts s and n are used to refer to tfie speech 
and noise contribution of a signal. During periods of qjeech + noise, the references yi[k], i = 0, M - 1 
contain speech + noise. During periods of noise only, yi[k], i = 0, .... M - 1 only consist of a noise 
component, i.e., yi[k] = [A;]. The second order statistics of the noise dgnal are assumed to be quite 
stationary such that they can be estnnated during periods of noise only. 

To design fhe fixed, spatial pre-processor, assumptions are made about tiie microphone characteristics, 
flie weaker position and the microphone positions and finihermore reverberation is assumed to be absent 
if these assun^tions are satisfied, ttie noise references do not contain any speech. i.e., y?[k] = 0, for 
< = 1, M — 1. However; in practice fbs assun^tions are often violated (e.g. due to micrc^hone 
mismatch and reveri)aration) so that speech leaks into the noise references. To limit the effect of such signal 



7- 



leakage, the ANC -wim-^i^ 



where 



(4) 



Wi=[«;40] Wi[l\ ... Wi[L-l]Y, (5) 
is adapted during periods of noise only [7, 13]. Hence, the ANC wi:m-i minimizes the ou^ut noise power. 



1.6., 



and equals 



wi:i^_i = arg^n^^ €{\y^[k - A] - wffjk,_i(fc]yjj>^_i[fe]p} 



(6) 



where 



wi:M,i = 'g{y?:M-iy;:?J^-i}-^g{y?:M-iyg-*[fe - A]}.] 



(7) 

(8) 
(9) 

and where A is a delay applied to Ihe speech reference to allow for non-causal taps in the filter wi-Af-i. ITie 
delay A is usually set to ff ]. where fx] returns the smallest integer equal or larger than x. The subscript 
1 : M - 1 in vfiM-i and yiM~i refers to the subscripts of the first and last channel camponent of the 
adaptive filter and iiq>ut vector, respectively. 

Under ideal conditions (j/f [fc] = 0, i = 1, , , . . M - 1), the GSC minimizes the residual noise while 
not distorting the desired speech signal, i.e.. z'[k] = j,S[fe - A]. However, when used m combination with 

small-sized arrays, a small error in the assumed signal model (hence yt[k] ^ 0, i = 1 M - 1) already 

suffices to produce a significantly distorted output speech signal z'lk] 



(10) 

even when only adapting during noise^nly periods, so a robustness constraint on wi-^-i is required [17] 
In addition, the fixed beamformer A(z) should be designed so thtft the distortion in the-speech«ference 
l4[k] IS minimal for all possible model errors.. In the sequel, a delay-and-sum beamfiMmer is used. For 
smaU-sized arrays, this beamformer offers sufficient robustness against signal model eirors, as it minimizes 
the white noise gain or noise sensitivity 2. Given statistical knowledge about the signal model errors that 
occur m practice, finther optimized beamfo imers can be designed, e.g., using the techniques in [3 1]. 

•in a time-domain nnplementation tte input sigoak of the adaptive filter vtum-x and Hie fitter v^um-i are real. Hence. 

W ^Z^^^J^ '"^^ i-pm signal, so .hat they can also be appHed to L 

«nd i^fftl"** sensitivity is defined as the ratio of the spatially white noise gain to the gain of the desired signal 

and K, often used to quantiiy the sensitwity of an algorithm against errors in the assumed signal model [2. 14]. 



1.2 Quadratic InequaUiy Constraint (QIC-GSC) 

A common approach to increase the robustness of the GSC is to apply a Quadratic InequaUty Constraint 
(QIQ [9]-[14, 19] to the ANC filths wi-aj-i, so that the optiunization criterion (6) of the GSC is modified 
into 



Wi:M-l = a^g^^in^^{|yJ(fe-A]_wf^_,[&]yn^^^ 
subject to wg^_iWx.Af-i < 



(11) 



The QIC avoids excessive growth of the filter coefficients w. Hence, it reduces the undesired speech distor- 
tion when speech leaks into the noise references. In [14, 19], it is shown that -for a GSC with a blocking 
matrix B(/) that satisfies B»(/)B(/) = I- Hm QIC on the ANC filters corresponds to a constraint on the 
noise sensitivity. 

In [14], the QIC-GSC is unplemented by using the adapUve scaled projection algorithm: at each update 
step, the quadratic constramt is appUed to tiie newly obtained ANC filter by scalmg the filter coefficients 
^ l|wi,ff-x|| ^.M-i^i:M-i exceeds fi^. Although this technique works well for IMS updating- it 
does not appear to be as effective for RLS as fiw LMS [19]. Recently, Tian et al. hnplemented the quadratic 
constramt by usmg variable loading [19]. For RLS, this technique provides a better approxunation to the 
optimal solution (11) than tiie scaled projection algoritimi. For LMS. variable loadmg does not appear to 
offer any performance advantage over the cheaper, scaled projection LMS. 

2 Multi-channel Wiener iQltering (MWF) 
2.1 Concept 

Recentiy, a Multi-channel Wiener filtering (MWF) technique has been proposed that provides a Mhrnnum 
Mean Square Error (MMSE) estimate of tiie desired signal portion in one of the received microphone signals 
[21, 22, 23, 24]. Li contrast to flie GSC, tiiis filtering technique.dqes not make any a priori assumptions about 
tiie signal model and is found to be more robust [16, 17, 21]. Especially hi compUrated noise sJ^os ^ch" 
as multiple noise sources or difiuse noUe, Hut MWF outpeifonns tiie GSC, even when tiie GSC is suppUed 
witii a robustness constraint [17]. 

The MWF Wi:Af e C^^""^ minhnizes tiie Mean Square Error (MSE) between a delayed version of flie 
(unknown) speech signal ut[k - A] at flie i-tii (e.g., first) microphone and tiie sum wga,ui.M(fc] of ttie M 
filtered, received microphone signals: 



(12) 



leadngto: ' 

wi:M = H^i-M[k]u^M[k]r^£{ni-M[k]ut'*[k - AvTI (13) 



= [wi ^2 ••• ^m]^, (14) 

«?Af[fcl = [ui[fc] uaM ... UM[fe]]^, (15) 

«<W = [mlfc] Wi[fc-1] ... Ui[k-L + 1]Y. (16) 

An equivalent approach consists in estimating a delayed version of tbe (imknovm) noise signal Al 
in the i-th microphone, resulting in 



|wi,^ = argmin£{|<[fe - A] - w^mUi:m[&]1''} . 



(17) 



and 
where 



i Wi:itf = e{ui,Mlk]uf[Mm-^S{uuM[kK'*[k - A]}^ (Ig) 
^M=[wi W2 ... Wj»fj^. (19) 

The estimate of the speech component u?[fe - A] is then obtained by subttaoting Ifae estimate ^[k - A] = 
^ff&f "1: M M from the delayed, i-Hb. mioophone signal Ui[k-A], ie. ' 

- A] = Ui[k - Al - W?MUl:Af{A]. (20) 
This is depicted in Figure 2 for «?[fc - A] = tiJ[A - A]. Using (13) and (18), it can be easily shown that 

wi-iitf + ^ijtf = e(<_i)i+^, (21) 
wiUi ei the canonical vector, defined as 



ej=: [0 •-. 0 ^ 0 ... Ol' 

L position I J 



(22) 



This shows that the two approaches mdeed lead to exactly the same speech signal estimate. A procedure for 
computing wi:/if or wi:jvf will be giv«i m Section 2.3. 



2.2 Trade-off speech distortion versus noise reduction (SDW-MWF) 

The residua] mox energy equals 

£{|e[fe]p} = €{\^[k - A] - ^M^iMkfy, (23) 

and can be decomposed as 

£{\vt[k - A] - '^.M<M''} + e{\^MulMfy (24) 

■V'"' 1 II • ' ^ ^ 

where ^ equals fhe speech distartioa eaeigy and the residual noise energy. The desiga criterion of 
the MWF can be generalized to allow for a trade-off between speech distortion and noise reduction, by 
incocporatiaig a weighting fector ft [20] with /x € [0, oo] 



■^i-j^ = arg mmf{|<[&- A] -^vLl^[kf} + U^ll^^M^ZMf}- 



The solution of (13) is given by 



(25) 



(26) 



which corresponds to tiie Wiener formula with an acyustable mput noise leveL Note that (18) is obtained 
with ^ = 1 and that (21) stiU appUes. The filter (26) coiresponds to the time-domain constrained estimator 
proposed m P2], which optimizes tiie following oitexion: 



min el sul>)ect to ^ < a£{«J:j^uJ } 

where 0<o!<land/iistiie Lagrange-multipliw: 

Equivalentty, tiie optimization criterion for w in (13) can be modified into 



(27) 



wi:M = arg min £{\^M<Mlkf} + m5{|u?[A; - A] - ^.Mulu[k]\\ 



(28) 



resiiltingin 



(29) 



In tiie sequel, we will refer to (29) as tiie Speech Distortion Weighted Multi-channel Wiener Filter (SDW- 
MWF). 

The factor n € [0, oo] trades off q)eech distortion versus noise reduction. If/* =■ 1, the MMSE criterion 




(12) or (17) IS obtained. IfjM > 1. ihe residual noise level will be reduced at flie expense of increased speech 
distortion. By setting to oo, aU eniphasis is put on noise reduction and speech distortion is completely 
Ignored. This results in w = 0 or w = e(i_i)^^. which means that the output signal equals 0. Setting a 
to 0 on the other hand, results in w = e(i_i)^^ or w = 0 and hence in no noise reduction. 

23 Implementation of MWF 

In practice, the correlation matrix f {u|,^[fclut;^[fe]} is unknown. During periods of speech, the inputs 
«, ft] consist of speech + noise, Le., u,[k] = «|[ft] + <[ft]. i = 1, .... M. During periods of noise. 

r^'^n'TT^ "^^^'^ " ^^"^ ^"^^8 that the speed! and noise signal are uncorrelated. 
^v4:M[k]^iMM} can be estimated as 

f {u!:MlfcK;^[ft]} = £{^l:M[k]n^[k]} - £{ujAf [fc]u^(ft]>, (30) 

where the second order statistics S{u,.Mk]nfrM are estimated during speech + noise and Ihe statistics 

MUi:M[ft]ui:Mlfc]}durmg periods of noise only. Like fiw the GSQarobnst speech dete 
Usmg (30). (29) and (26) can be re-written as: 



W.:^ = (^g{UX:A.Wug^[fe]} + (1 - ^)g{u£^[fc]u;^g[ft]}) ' £{uj^ [ft - A]} 



(31) 



and 



^IM = {S{Ui..M[k]u?j^[k]} + (M - l)f {U?,ji^ Ifc]uJ^[fc]}) 
X {£{tiUM[k\utlk - A]} - £{nlM[kK'*[k - A]}) . 
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(32) 



In [21 ]. tiie Wiener filter is con^«ited at each time instant ft by means of a Generalized Singular Value 

Decomposition(GSVD)of an speech+noiseandnoisedatamatrix.Acheaperiecursive alternative based on 
a QR-decomposition has been proposed in [22]. In [23. 24]. a subband hnplementation has beeo developed, 
to mcrease mtelligibility and reduce complexity, making it suitable for hearing aid applications" 

Fmally note instead of estimating f {ut^[fc]u;;« [fc]} online using (30). a predetermined estimate 
of f WiiflfcluilAf W> is sometimes used [25. 33]. In [25], titis estimate is derived from clean speech 
lecordmgs measured during an initial cahTaation phase. Additional recordings of the source speech signal 
aUow to produce an estimate of the non-reverberant source speech signal instead of an estimate of tiie 
reverberant speech component in one of the microphone signals. However, since the room acoustics, the 
position of desired speaker and microphone characteristics may change over time, frequent re-calibmtion 
IS requn«± hi [33]. a mathematical estimate of tiie correlation matrix and the correlation vector of the 
non-reverberant speech is exploited in which some signal model errors are taken into account 



In this Section, the present invention is described in detail. 

In Section 3. the proposed adaptive multi-chamiel noise reduction technique. lefened to as Spatially 
Pre-processed Speech Distortion Weighted Multi-channel Wiener filter, is described. 
^ Jw"" ^'^ ^""^ «»«6od&„«,4 lefeired to as Speech Distortion Regularized GSC (SDR- 

GSC). A new design criterion is derveloped for the adaptive stage of the GSC: the ANC design criterion is 
supplementedwitharegularizationtennthatlimits speech disto^^ 
GSQapanttneterMis incorporated that aUowsforatrade-offbetween speech 

Focussmg all attention to noise redxiction, results in the standard GSC, while, on the other hand, focussing 
aUa^on towards speech distortion results in the oulpm of Affixed be In noise scenarios with 

low SNR, adaptivity in the SDR-GSC can be easfly reduced or excluded by increasing attention towards 
!r^t*^ c^"' ^'^^^ P^«^ M to 0. The SDR-GSC is an alternative technique to 

the QIC-GSC to decrease the sensitivity of the GSC to signal model errors such as microphone mismatch, 
n^^ateation. .... m conHast to the QICGSC. the SDR-GSC shifts emphasis towards speech distortion 
wh«i ftc amount of speech leakage grows. In the absence of signal model errors, the perfonnance of the 

GSCBpieserved. As a result, a better noise reduction performance is obtained for sniall model eiiors while 
guaranteemg robustness against large model errors. 

.^IT^.^'^'^"*' ^"^^^ ^ ™P«>^ "^^^ '^^on performance 

Of the SDR-GSC by adding an e>rtra adaptive filtering operation wo on the speech reference signal. Werefer 

^""^ ^-^^ W^Bhted Multi-channel Wiener 

Fzft.r(SP-SDW-MWI0.Tl«SP^DW.MWFisdepictedinFigure3and encompasses th^ 
case. Agam. a parameter n is incorporated in the design criterion to allow for a trade-off between speech 
dKrtoitoon and noise reduction. Focussing aU attention to speech distortion, results m the output of the fixed 
be^nfomier Also here, adaptivity can be easily reduced or excluded by decreasing ^ to 0. It is shown 
fluit-mthe absence of speech leakage and for infinitely long filter lengths- the S 

to arcade of a SDR-GSC with a SDW-SWF postfilten'In the presence of speech leakage, the SP-SDW- 
^cnw x^" *° "^^^ performance: cbmpared to a SDR-GSC with SDW-SWF postfilter. the 
f 1^!^ "^"^ compensate for the performance degradation of 

the SDR-GSC wxthSDW.SWFduetoq«echleakage:hrcontrasttotheSDR-GSC(andA^ 
J'.'nr^.Tnfr ^ nu^ophone mismatch. In [22. 27] recursive implementations of the 

(SDW-)MWFhave been proposed basedonaGSVDorQRdecomposition.Asubbandimplementation [28] 
results m unproved intelUgibility at a significantly lower cost compared to the fiillband approach. These 
techmques^ can be extended to implement the SDR-GSC and, more generally, the SP-SDW-MWF 

In a thnd embodiment, described in Section 4. we propose cheap time-domain ami Jrequency^omain ■ 
stockasttc gradient implementations of Ae SDR-GSC and SP-SDW-MWF. Starting ftom the design crite- 
non of the SDR-GSC. or more generally, th e SP-SDW-MWF. we derive a time-domain stochastic gradient 
'The implementation based on GSVD can only be used fbr the SP-SDW-MWF with filter wo. 
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algonthm. In addition, we modify the LMS based algorithm [26] so that it applies to the SP-SDW-MWF To 
mcr^ convexgence and leduce c^n^lexily. a fi^ 

^s^chast^c gradient and LMS based algorithm suffer ftom a large excess error when ^.L in 1^ 
^v^g no.se scenanos. We show that the excess error m the stochastic gradient algoriL is red!c^ 
^app^galowpass^tertothepaxtofthegradientestimate 

Ttlr^ avords a haghly tnn^vaxying distortion of the desired speed, component while not degrading 
flietrackmgperformaBcei^ededintime-vaxyingnoi^^ ""^ ^^tic gradient SP-SD);^ 

outperfom. tt.e LMS based algoriflm. while complexity is noting Exper^tal «s„lts 
t ^'^^"'^ stochastic gra^ent algoriJltT^ 

TeZ^Tf I- addition, experiments demonstrate^t 
^]^sedstoc*as.c gradient algorithm preserves ihe benefit of the SP-SDW-MWF over QIC-GSC. The 
inmted computatronal cost and the better noise reduction performance of the proposed algorithm make^I 
goodaltemativetotheSPA[14]forimplementationinhearingaids. g ntmnmakerta 

3 SpatiaUy pre-processed SDW Multi-channel ^enerfflter 

3.1 Concept 

I^fTr^w i^Sr* pre-processed. Speech Distortion Weighted Multi-chamael Wrener filter 

f(T^Tt ""-^^7:^ of a fixed, spatial preprocessor. i.e., a fixed bea^^o^^ 

T"- • " '"^^^ ^^^^ Multi-cham.el Wiener filter 

(.s»DW-MWF). Given M miarophone signals 

Ui[k] = <[&] + um, i = 1, M (33) 

^[k] the d^ired ^eech contribution and the noise contribution, the fixed beamf«n«r A(.) 
raeates a so-called speech reference ^ ' 

yo(A;]=j,gfc] + yJ[fc], ^3^j 

by Steering a b^ towards the direction of the desired signal with a speech contribution ym and a noise 
con^bu^on^^^^^^ 

s L rrr^ 't^'" M^) sh««dd be designed 

mo^ ^tT" : ™^ ^ P--"^ in the assumed sSL 

m^l^ asmrcrophone m^match. m the sequel, a delay-and-sum beamfonner is used. Forsmall-sSS 
^^.^ beamformer o^rs sufficient robustness against signal model errors as it mmimizes the white 

Z^r^ « ^'^^'^^^ ^^'^^ -0^1 -or« fl-t occur in 

pracce. a fi^ opt,„^ beamfonner A(z) can be designed. e.g., nsmg the techniques m [31]. 



• 

-IH 



blocking matrix B(a) creates M -1 so-called noise references 



Vi[k] = l/i{k\ + »f [fc], « = 1, M-1 



(35) 



by steering zeroes towards the frant so that the noise contributions i/?{A:] are dominant compared to the 
speech leakage contributions »f [fc], A simple technique to create tbe noise references consists of pairwise 
subtracting the for 0" time-aligned microphone signals. Using [31, 34], further optimized noise references 
can be created. Speech leakage can tiien be minimized for a specified angular region around 0» mstead of 
for O^only, e.g., for an angular region from -20" to 20". In addition, given statistical knowledge about the 
signal model errors that occur in practice, speech leakage can be minimized for all possible model errors by 
using pi]. 

In the sequel, the superscripts « and n are used to refer to the speech and noise contribution of a signal. 
During periods of speech + noise, the references yi{kl i = 0, M - 1 contain speech + noise. During 
periods of noise only. Vi{% i = 0, M - 1 only consist of a noise component, i.e., yi{k] = y^[k]. The 
second order statistics of the noise signal are assumed to be quite stationary such that they can be estimated 
during periods of noise only. 

Tbe SDW-MWF filter* wo:jfer-i 



with 



Wi[fc] = [ W[0] U,[l] ... - 1] ]^ 

y^fM-iW = [y^{k\ yf[fc] ... y|^_i(&)], 

y<[fcl = [vm yi[k-l] ... yi(fe-i + l]]^. 



(36) 

(37) 
(38) 
(39) 
(40) 



provides an estimate w^^.^yo^Af-ifc] of the noise conlribution 3^[ib - A]* in the speech reference by 
minunizmg the cost fiinction J(wo:j»jr_i) 



(41) 



a time-domain inqilementatiiui, fiie input dgnals of the adaptive filter and the filter wo,m-i are real and hence. w?m-i == 
plraiaitation* *wnn>las are generali2ed to conqilex input signals so that they can also be appKed to a subband 

*The delay A is applied to tiie speech reference to make the filter w non-causal. Usually, it is set to f^l . where M returns the 
smallest mteger equal or larger than X. 



-IS- 



The subscnpt 0 : M - 1 in wo:a^_i and y^.^-, refers to the subscripts of the first and last channel 
component the adaptive filter and input vector, respectively. n,e term el represents the speech distortion 
energy and residual noise energy. The tenn 1^ in the cost fimction (41) limits the possible amount 
of speech distortion at the output of the SP-SDW-MWF. Hence, the SP-SDW-MWF adds robustness against 

model errors to the GSC by taking speech distortion expHcifly into account in the design criterion of 
ttie adaptive stage. lUe parameter i € (0, oo) trades off between noise reduction and speech distortion- flie 

^. the smaller the amount of possible speech distortion. For = 0. the output of the fixed beamformer 
A(z). delayed by A samples is obtained. In noise scenarios with very low Signal-to-Noise Ratio (SNR) 
tp'c^w K^^'^f""* beamformer may be preferred. Adaptivity can be easily reduced or excluded in the 
SP-SDW-MWFbydecieasmgMtoO. Alternatively, adaptivity can be Ifanited by applying a QIC town.*, i 
NotethatwhenthefixedbeamfonnerA(z)andtheblocldngmatrixB(;s)aresetto • 



A(z) 



B(^) = 



[l 0 ... o]^ 



0 1 

0 ••. 



0 10 
0 0 1 



(42) 



(43) 



tve obtam the original SDW-MWF that operates on the received microphone signals . i = 1 M 

Belo''.*edifferentparamfit«-settingsoftheSP-SDW-MV<Tarediscus^ 
the parameter and presence or absence of the filter wo. the GSC. the (SDW-)MWF as well as in-between 
sohitions such as the Speech Distortion Regularized GSC (SDR-GSC) may be obtained. We distinguish 
bebveen two cases. i.e.. the case where no filter wo is ^plied to the speech reference (filter length = 0) 
and flie case where an additional filtw Wo is used (io 7^ 0). 

The adaptive stage of the SP^DW-MWF can be implemented usmg the recursive QRD-based imple- 
mexitation of the SDW-MWF [22]. Like for the SDW-MWF. complexity can be reduced by a sZnd 
nn^ementatron [23]. For Lo ^ 0. also the GSVD based algorithm [20] can be applied. Cheaper stochastic 
gradient based algorithms are proposed in Section 4. 

3^ First embodiment: SDR^C, Le., SP-SDW-IMWF without Wo 

First, consider the case wiOtout wo. i.e. £o = 0. The solution for w,,m_i in (36) then reduces to 



4 ' 



(44) 



leading to 

^uM-i ^ (^g{y!:i>.-iy;:^-x(fc]} + g{y£M-iy;^-i[fe]})"' f {y?;M-i[%r [a> - A)}, 



(45) 



where ^ is the speedi distortioii energy and 4 the residual noise eneigy. 

Remark Fbr Lq = 0, it is readily seen that (21) does not hold, te., wi:m-i + ^xM-x ^ ca yfhere 

because the speech component y^M-itfe] inAe input to the adaptive fiUer ^x.M-xdoes not contain Ae 

estimated speech signal y^[k — A]. 

If = 1, the classical MMSE criterion (cfr. (17)) is obtained. 

Conq>ared to the optimization (adterion (6) of the GSC, a regularizatioa term 

^HMM.xylM-xmf"} (47) 

has been added This regularizalion term limits the amoxmt of speech distortion that is caused by the filter 
wi:*,_, when speech leaks into fee noise ieferen<jes. ie.. y?[k] ^ 0, i = 1, M - 1. In the sequel, we 
therefore refer to the SP-SDW-MWF with Xo - 0 as Speech Distortion Regularized GSC (SDR-GSC). The 
smaller fi. the smaUer the resulting amount of speech distortion wtH be. For /* = 0, the ou^ of the fixed 
beamformer A(z) delayed by A samples, isobtained. For/x = oo, all emphasis is put on noise reduction and 
speech distention is not taken into account. This corresponds to the GSC. Hence, the SDR-GSC encompasses 
fee GSC as a special case. 

The regularization term ^S{\^^,^_,[k]r,.,M.^[kf} wife i ,6 0 adds robustness to fee GSC. whUe 
not affecting fee noise reduction perfi)nnance in fee absence of speech leakage. 

• In fee absence of speech leakage. Ue.. t^[k] « 0, » = 1 M - 1. fee regnlarization term equals O 

for all wi:M-i and hence the residual noise eneigy 4 is effectively minunized.- In ofeer words, in fee 
absence of speech leal^ge. the GSC sohition is obtained. 

• In the presence of speech leakage, i.e., [A] ^ 0, i = 1, .... M - 1, speech distortion is taken 
mto account in fee optimization criterion (44) for fee adaptive filter w. limiting speech distortion 
plus reducmg noise. The larger fee amount of speech leakage, the more attention is paid to speech 
distorti(m. 

To limit speech distortion alternatively, a QIC is often imposed on fee filter wi:m-i (see Section 1 .2). 
In contrast to fee SDR-GSC, the QIC acts inespective of fee amount of speech leakage y»[fc] that is 
present. The constraint value iS^ in (1 1) has to be chosen based on the largest model errors tiiat may 
occur. As a consequence, noise reduction performance is compromised even when no or very small 



model eirois an; present Hence, the QIC is more conservative than the SDR-GSC. The emerimental 
results in Section 3.4 confiim this. 



3.3 Second embodiment: SP-SDW-MWF with filter Wq 

Since the SDW-MWF (36) takes speech distortion explicitly into account in its optimization criterion, an 
additwrnal filtering wo.on the speech reference y,[k] may be added. Tb^ SDW-MWF (36) then solves the 
followmg more general optimizatioa ccitraion 



V(kM~i = arg rain 

WOiJI- 



V ■ 

e2 



y!.jfef-i[A] 



(48) 



where w^j^.j = (w? w^^.J is given by (36). 

trades off speech distortion and noise reduction. For = co, speech distortion is completely 
Ignored so that the sohrtitm becomes 



(49) 

wHch results in a zero output signal. For m = 0. aU attention is paid to speech distortion so that the ou^ut 
of the fixed beamformer delayed by A samples, is obtained. 

. In th^ absence of speech leakage, i.e.. j^[fc] = 0 for i = 1, .... M - 1. and for infinitely long filters 
Wi. % - 0, M - 1. the SP-SDW-MWF with wo corresponds to die cascade of a SDR-GSC and a 
SDW Single-channel WF (SDW-SWF) postfilter [30, 35]. 

Proof: In case of infinite filter lengths, die SP-SDW-MWF Wo:i^_i(/) and its optimization 
cntoion can be represented in die firequency-domain: 



Wo:ifer-i(/) = arg min 



im fj [(exp(-j27r/A)-W?(/)) -W^;^^_,(/)] 



1 



(50) 



VTithout loss of generaUty. we assume -for reasons of simplicity- A = 0. 
Decompose WiM-iif) as 

Wi,A^_i(/) = (1 _ Wb(/)) W.e.i:j^_i(/) (51) 

with Wb(/) a singje-channel and Wa,i.M-iif) a multi-channel filter and define an intennediate 
output V(/) (see also Figure 4) as 

V{f) = Yoif) - W^i,^_i(/)Yi:M-i(/). (52) 
Then, the cost function J(Wo, "WdAM-i) of (50) can be rewritten as 

J = S {1(1 - W$if)) vn(/)|2} + le{\WSU)V'if) + W^,,^_,(/)Yt^_,(/)|^} . (53) 
Ftom swf Wd,i;M-i) = 0, we find 

This single-channel filter Wo(/) consists of two terms. 

- Thefiisttemi 

Wo,i{f) = ^£:{F"F"-*} + j^eiV'V'*}^ £{v^v^'*} (55) 

estimates the noise component V"(/) in the intennediate ou^nit V(f). The filter 1 - Wb 1 cor- 
responds to a SDW Singje-channel Wiener Filter (SDW-SWF) that estimates the speech c^o- 
nentV*(/). *^ ^ 

— The second term 

WoM = (£{y»V".*} + (-^5{X^^Yj;^_,W..x.^.,}) (56) 

estimates the speech leakage fitered by Wa,uM-iif), i.e., -W^ .^_^Yf.^_i. The speech 
component in die intermediate output Vif) equals F»(/) = - W^fi.jj^.iYf.jj^.i. The filter 
Wo^if) tries to compensate fer the distortion -W^i,jj^_iYJ,j»,_i by adding an estimate of 
^dA:M-l^i:M-l ^ ^ oulput of the SDW-SWF. 

In the absence of speech leakage (i.e.. Y^^_, = 0). the filter WoM) equals zero and 1 - Wo(f) 
conesponds to a SDW-SWF. 



^Vv>d.i,M-i"^(^o« Wd,i:Af-i) = 0. we obtain the foUowing solution for "W^^uM-iif): 

{em•.M-^y^*} - i5{YL^.,y-^}) . . (57) 

Also the multi-channel filter Wd,i:M-iU) consists of two tenns. 

- The first tem conesponds to ihs SDR GSC 

{^i^^^-^^^l^.^} + ^S{•yi.^,-^^l^,,}^^^ (58) 

and estimates the noise component (/) at the output of the fixed beamfonner. 

- The second tenn tries to compensate fiw Ihe speech distortion -WJ(/)i^»(/) caused by Wq (/) 
by adding an estimate of ^m^y^'if) to the output of the SDR-GSC. Note that this cone- 
sponds to adding an estimate of WSif)Yo»if) to the output Z(/) of the SP-SDW-MWF. 

In the absence of speech leakage, yVa,i..M-i if) corresponds to a SDR-GSC or a GSC. 
Figure 4 illustrates graphically the solution foTWa,v.M-iif) and Wb(/) for A = 0. In the absence of 
speech leakage, the filters that try to compensate for die speech distortion equal 0. hence, the SP-SDW- 
MWF corresponds to a SDR-GSC (or GSC) with SDW-SWF postfilter. The SP-SDW-MWF achieves 
the same or a better Signal-to-Noise Ratio (SNR) improvement than the SDR-GSC, depending on the 
noise scenario. ^ 

3.4 Experimental results 

This Section Ulustrates the theoretical results of Section 3.2 and Section 3.3 by means of experimental 
results for a hearing aid application. Section 3.4.1 and Section 3.4.2. respectively, describe the set-up and 
the performance measures that are used. In Section 3.4.3, flie inipact of the different parameter settings of 
the SP-SDW-MWF on the performance and flie sensitivity to signal model errors is evahiated. Comparison ■ 
is made with the QIC-GSC. 

3.4.1 Set-up 

A thiee-microphone Behind-The-Ear (BTE) heating aid with tiiree omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an office room. The interspacing d between the first and 
the second microphone is about d =. 1 cm and tbe mterspadng between the second and 'third microphone 
about 1.5 cm. The reverberation time Tgom is about 700 ms for a speech weighted noise. Hie desired 
speech signal and tiie noise signals are unconelated. Both the speech and the noise signal have a level of 



70 dB SPLat&e center of lie head. The desired speech som^^ 

oflmeterftomthehead:the8peechsouicemfKmtofaehead.theiMMse8o^ the 
speech source. To get an idea of the average performance based on directivity only, stationary speech and 
noise signals with the same, average long-term power spectral density are used. Hie signals can be found 
on [36]. nie total duration of the input signal is 10 seconds of which 5 seconds contains noise only and 5 

seccmdsc^tainboth the speech and noise sigaal. For evaluatioapuxposes. the 
oeen lecoided separately. 

The microphone signals are pre-whitened prior to processmg to hnprove hilelligibimy [371 and the 
ou^ut IS accordingly de-whitened. In the experiments, the microphones have been calibrated by Ineans of 
recordmgs of an anechoic speech weighted noise signal positioned at 0» measured while the microphone 
anay ^mounted onthehead. Adelay-and-sumbeamfomier isused as a fixed beamformer, since -m case 
of smallmicrophone interspacing. itisiobusttomodelenors. The blocking matrix B pairwise subtracts 
the tune ahgned calibrated microphone signals. 

To investigate die effect of the different parameter settmgs (i.e. wo) on the performance only, the 
filter coeffici^ts are computed using (36) where f {yg^^^.^yS;^.,} is estimated by means of the clean 
^^contributions of the microphone signals. Inpractice. 5{yg.;^_,yg;^_ J is approxhnated using (30). 
The ^ect of approxnnation (30) on the performance was found to be smaU (i.e. differences of at most 
0.5 dB m intefligibiUty weighted Signal-to-Noise ratio hnprovement) for the given data set The QIC-OSC 
IS unplemented usmg variable loadmg RLS [19]. The filter length Lper diannel equals 96. 

3.4.2 Perfonnance measures 

To assess theperformance of the dififerent approaches, the broadband mtelligibiKty weighted signal-to-noise 
ratio nnprovement [38] is used, defined as / s" bu« to-noise 

ASNRi^ = X)ii(SNRi.„„t - SNR<^), (59) 

where be band hnportance fimction I, expresses the mqK>rtance of the i-th one-third octave band with 
cen^ ftecpjency /P for intelligibility. SNR,.„^ is the oulput SNR (m dB) and SNR, is the mput SNR 
^ dB) m the ?-th one third octave \^ '"^h'^-f^^cif^-ft -^-^-,^-!: ^ . 

The mtelhgibihty weighted signal-to-noise ratio reflects how much mtelligibifity is unproved by the noise 
reduction algorithms, but does not take mto account speech distortion. 

To measure the amomit of speech distortion, we define the foUowing intelligibiKty weighted spectral 
distortion measure 



SPfatelUg ==y)fiSDi 



(60) 



with SD, the average spectnd distortion (dB) in i-th one-third band, measured as 

"""^ = U^st ^ I \^"'" - n\ . (61) 

the power transfer function of speech ftom fl« input to flie ouq,«t of the noise reduction algo- 

^ "^Tl "^.f^^l °' preprocessor; the perfom^ measures ar« calculated w.r.t the 

output of the fibced beamfotmer. 

3.4.3 Esqierimental results 

J^L^rf ^"^^^ ^ °° performance of the SP-SDW-MWF is il- 

^^"^ of-icrophone mismatch. e.g.. gain mismatch of the second microphone, on 
Ljfr t^u Among the differcmpossxT,le signal model errors, microphone mismau^^ 

fouru^to be especrally harmful to the performance of the GSC in a hearing aid application[17]. In hear- 

mfc^l'7 '^"^ ^ ^ f^J- eain and phase differlci between 

microphone characteristics of up to 6 dB and 10». respectively, have been iq,arted. 

SP-SDW-MWF without wq (SDR-GSC) 

^^^Jn^'^r"'^ ^""^ ^^^"^ S^'"-"^ - - fi^-tio- of i obtamed 

^ted Hence, the amount of speech distortion is low for all ^. Smce there is stiH a smaU amount of 
nt^f x*""' reverb^ation. the amount of noise reduction and speech distortion slightly decreases 
^«ea«ng -. .espe«a„y for 1 > l. /„ the presence of microphone mismatch, the amount of speech 
mto the noise ^erences grows. For 1 = 0 (GSC). the speech gets significantly distorted. Due to 
cancellation of ^e desired signal, also the hnprovement ASNR^ degrades. letting- !•> 0. improves 
^e p^oimance of ^e GSC in the presence of model errors without compronusmg ^LmaLce in the 
absaice of signal model errors. 

SP-SDW-MWF with filter Wo 

Fi^ 6 plots the performance measures ASNRi„^„j, and SD^^ of the SP-SDW-MWF with filter wo. 
In general, the amount of speech distortion and noise reduction grows for decreasing K For u = 
^1 attention is paid to noise reduction. As also iHustrated by Figure 6. this results m a Jital carLuatior; 
the speech and the noise signal and hence degraded performance. /„ ^ absence of model em>rs, the 



TTtZl '"'^ ^ 5^ 0 - for i = 0 . in the same ASNR^^u,, \ while the distortion 

fiM- the SP-SDW-MWF with wo is higher due to the additional stogle-channel SDW-MWF. For Lo ^ 0 the 
perfonnancc does -in contrast to io = 0 - not degrade due to the microphone mismatch. 

Comparison with QIC 

Fig™« 7 d^icts the improvement ^^Sm^ and the speech distortion SDj^m,. respectively, of the QIC- 
QSC as a function of /S^. Like the SDR-GSC. the QIC increases the robustness of the GSC. The QIC is 
UHiep««lentofthe amount of speech leakage. As a consequence, distortion grows fest with in«easing gain 
demtion. The constnunt vahie & should be chosen so that the maximmn permissible speech distortion level 
IS not exceeded for the largest possible model errora. IMS goes at the«cpe^ 
smaUmodel errors. The SDR-GSCon the other hand, keepsftespeedi distortion^ 
(seeFigureS). Attention towards speech distortion is mcreasedifthe amount ofspeech leakage grows Asa 
result, a better noise reduction performance is obtained for ^ model errors, while guaranteeing sufficient 
robustness for large model errors, lhaddition. Figure 6 demonstrates ftat an additional filter wo significantly 
unpioves the performance of the SP-SDW-MWF m die presence of signal model errors. 

33 Conclusion 

m the present im^ention. we established a generalized noise reduction scheme, referred to as Spa^dOynr^ 
processed. Speech Distortion Weighted Multi-chamel menerfUter (SPSDW-MWF), that consists of afixed, 
qatal ^processor and an adaptive stage thatis based onaSDW-MWF.Thenewschemeencomp^^^^ 
GSC andMWF as special cases. Inaddition. it aUowsforaain4«tweensohxtion that c« 
Spe«:h Distortion Regularized GSC. Depending on the setting of ^irado^ parameter m and thepresence 
or absence of the filter Wo on the speech reference, the GSC. the SDR-GSC or a (SDW-)MWF is obtained 
In Section 3.2 and Section 3.3. the different parameter settings of the SP-SDW-MWF have been inter- 
pretetL 

. Without wo. the SP-SDW-MWF corresponds to a SDR-GSC: the ANC design criterion is supple- 
mented with a regularization term that limits the speech distortion due to signal model errors. The 
larger -, the smaller the ^oiml brdistorfion.- Tdf l ^" 6; d£totion isl^ored completely, which 
corresponds to the GSC-solution. The SDR-GSC is then an alternative technique to the QIC-GSC to 
decrease the sensitivity of the GSC to signal model errors. In contrast to the QICGSC, flie SDR-GSC 
shifts emphasis towards speech distortion when the amount ofspeech leakage grows. In the absence 
rfsignal model errors, the performance of the GSC is preserved. As a result, a better noise reduction 
performance is obtained for smaU model errors, while guaranteeing robustness against large model 
eirors. 



single channel s^^^^^ " » SNR.^ can be acWeved by io ^ 0 thenks to «» 
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. Smce the SP-SDW-MWF tafas speech distortion explicitly into account, a filter Wo on the speech 
reference can be added It is shown that -in the absence of speech leakage and for infinitely long filter 
lengths- the SP^DW-MWF corresponds to a cascade of a SDR^SC with a SDW^WF postfiltet 
In the presence of speech leakage, the SP-SDW-MWF with wo tries to preserve its perfbnnance- 
compared to a SDR-GSC with SDW-SWF postfilter. the SP-SDW-MWF then contains extra filtering 
operations that oonqiensate for thepetfoanance degradation of the SDR-GSC with SDW-SWF due to 

qK^h leakage. In contrast to the SDR-GSC (and ftus also the GSC).perfimnance does not degrade 
oue to microphoiie mismatch, 

fa Section 3.4. experimentel results for a hearing aid appUcation confirmed flie theoretical results of Sec 
tion 3 2 and Section 33. The SP-SDW-MWF indeed increases tire robustness of die GSC against signal 

model errors. Comparison wiOi Ore widely studiedQIC-GSCdemonstratedtiiatthe SP-SDW-MWF achieves 
a better noise reduction performance for a given maxhnum allowable speech distortion level. 

4 Third embodiment: Stochastic gradient implementations 

la [22, 27] recursive implementations of flie MWF have been proposed based on a GSVD or OR decern- 
posibon. A subband implementation [28] results in improved intelligibiUty at a significantiy lower cost 
compared to tire fiillband approach. Hrese techniques can be extended to implement tiie SP-SDW-MWF 

oftiieSP.SDW.MWF«.avaiI*le. In[25].anmSbasedalgoriflmrfora«MWFhasbeend^^^^^ Tko 
algonthm needs recordmgs of caUbration signals. Since room acoustics, microphone characteristics andtire 
locahon of the desired speaker change over time, frequent re-calibration is required, making this approach 
c^abersome and expensive. In [26], an LMS based SDW-MWF has been proposed that avoids the need for 
cahbra^ sr^ak. The algoriflmx however relies on some independence assumptions flrat axe not necessar- 
ily satisfied, m tiie present mvention. we propose time-domain and fiequency-domam stochastic gradient 
mplementations of tire SP-SDW-MWF that preserve flie benefit of matrix-based SP-SDW-MWF over QIC 
GSa Hre LMS based SDW-MWF of [26] is modified so that it applies to tiie SP-SDW-MWF scheme In 
addition, oflier stochastic gradient algoriflmis are developed tirat achieve a better performance. Experimental 
lesu^b demonstrate fliat flie proposed stochasticgradienti^^^ 
the SPA, while its computational cost is Kmited, 

This section is organized as follows. Starting fiom tiie cost fimction of tire SP-SDW-MWF a time- 
domam stochastic gradient algoriflim is derived in Section 4.1. Applying tire mdependence asslptions 
made in [26] results in an LMS based SP-SDW-MWF similar to [26]. To increase convergence and reduce 
compl^ity. tire stochastic gradient and LMS based algoriflmi are implemented in tiie frequency-domain. 
Botii. flxe stochastic gradient and IMS based algoritimi suffer fiom a large excess error, when applied in 
highly time-varying noise scenarios. In Section 4.2. we show fliat flie performance of flie stochastic gradient 
algonthm is miproved by applymg a low pass filter to flie part of tiie gradient estimate fliat limits speech 



d^toruon. The low pass filtering avoids a Mghty time-varyfag distortion of the desired speech cotm.onent 
while not degrading the tracking performance needed in tune-vaiying noise scenarios. Section 43 compares 
the perfomiance of the different firequency-domain stochastic gradient algorithms. Experimental results 
show that the proposed stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over the 
QIC-GSC. 

4.1 Stochastic gradient algorithm 
4.1.1 Derivation 

Astochastic gradient dgorithmapproxhnatesihe steepest descent algorithir^iisinga^ 
estmiate. Given the cost function (41). the steepest descent algorithm iterates as follows" 

w[n + l] = w[R] + £f-M!iO'j 

2 \ ^ /w=w[nl 

= w[n] + p {s{y-yr{k - A)} - f {y"y".»(feJ}wW - i^iy-y-'^liDw^) . (62) 

Witt w[fe]. y[ft] e C^^-S where N denotes the number of input chamiels to the adaptive filter and i fixe 
number of filter taps per chamiel. Replacmg the iteration mdex n by a time index * and leaving out the 
expectation values £{.}. we obtain the following update equation 




(63) 



For 1 - 0 and no filtering Wo on the speech reference, equation (63) reduces to the iq»late formnla used in 
GSC dunng periods of noise only (i.e.. when y,{k\ = y^kl i = 0, .„, M - 1). The additional t«m r[fc] m 
the giadicait estmiate Umits the speech distortion due to possible signal model errors 

Equation (63) requires knowledge of the correlation matrix y or 5{y-y«.»[&]} of the clean 

speech, m practice, this information is not available." To avoid the n^ for 6alibrati^ speech + votse 
signal vectors y^/, are stored into a circular buffer Bx € R^x-^w. during processing as in [26]. During 
penodsofnoiseonly(i.e..whenj,,[&] = i = 0. .... M-1). the filter wis updated using the followmg 
appioxmiation of the term x\k\ = iy»y*^[&]w[fc] in (63) 



iy-y"^[fclwlfc] « i (yfe./,y^,, [ft] _ yy^[A:]) w[fe], 



(64) 



2^r^ 4c subscripts 0 : M - 1 in the adaptive filter wo=m-x and the input vector yo:M-.a» omitted for the sake of 



This results in Has update fomnila 




(65) 



during praiods of noise only. In the sequel, a nonnalized step size p is used, i.e., 

p= f!. 

h IvbLhym^fAk] - y^y[k]\ + y«y[&l + 5* 

where 5 is a veiy smaU constant The absolute -value \y^f^y^f, - y^y\ has been inserted to guarantee 
a positive valued estimate of flie clean speech energy y».»y-[fc]. Additional storage of noise only vectors 
6C*^^'*^jnasccondbuffeB2 €R^'<W2alk)ws to adaptwalso during periods of 



usmg 



w[* + 1] = w[&] +p{y(^fMMf.[k - Al - ygjMk]) + ^ (ytuAy^/Jfc) - yy^W) (67) 
with 

p= _£ 

? |y^y - y^f^Ybufa \ + y^f^Vbuh + s ^^^^ 

In the sequel, we will - for reasons of conciseness- only consider Ihe update procedure of the time-domain 
stochastic gradient algorithms during ncnse only, hence. y[k] = y^[k\. The extension towards updating 
during speech + noise periods with the use of a second, noise only buffer B2 is straightforward: the equations 

are found by replacing the noiseK)nly. input vectors y [A] by yt„,, [fc] and the speech + noise vectors yftu/. 
by the input speech + noise vector y[fc]. 
Usin^ 

Wcj* = Q^{y«mAy£/J + (1 - ^)e{yy"}Y\{y^yl[k - A]}, (69) 
where y is a noise-only vector, and (65) it can be shown that 

£{w[fcH-l]-w^} = (I-pf{iyiuAy^/,-^(l-^)yy^})'■''£{w[0]-w^}. (70) 

Hence, the algorithm (65)-(67) is convergent in the mean provided that tfie step size p is smaUer &an jf- 
with A„ax the maximum eigenvalue of f {iy^^^^y"^^ + (1 - k)yyH} ^ similarity of (65) with standSd 
NLMS let us presume tha t setting p < gj^, with Ai. i = 1, .... NL the eigenvalues ofS{^y^f,y«^ + 

'When the second order statistics of the noise are short-tem stationary. Woj^ equals to (36). 



(1 - ^)yy^} e R^xxJ^ri ^ ^ 

2 



P < 



guarantees convergence in flie mean square. Equation (71) explains the noimalization (66) and (68) fcr the 
step size p, 

Howevei; since generally 

tfie instantaneous gradient estimate in (65) is -compared to (63)- additionaHy perturbed by 

^ (yy^W -yU^ytS^m) w(fc], (73) 

fer ^ ,6 oo. Hence, for /i oo. the update equation (65H67) suffers ftom a larger residual excess error 
than (63). The additional excess error grows for decreasing fi, increasing step size p and increasing vector 
length L.N of the vector y with ii the filter length per channel and AT the mimber of hqruts to the adaptive 
filter. It is expected to be especially large for highly tnne-varying noise, e,g., nnihi-talker babble n«Mse. 

4.1.2 NLMS based algorithm 

In [26], anLMS based implementation of the SDW-MWF has been proposed. Besides (64), some additional, 
independence assumptions are made. Applying these assumptions to (65)-(67), results in an LMS based 
implementation of the SP-SDW-MWF similar to [26]. Assuming diat 

\f~^ytmfdk]Vo[k-A] = 0 (74) 
hold, wifli k and I different time instants, (65) can be amplified to 



(76) 



where 



d[k]=yo[k-A]-j2==; x(A!] = y^y[fc] + yiyt„^J&] (77) 



during periods of noise only (i.e.. y[fc] = y''[fc]). During speech + noise (i.e., y{&] = y»[&] + y^tfc]), d[fel 
and x[A:] in (76) aie set to 

m = yoMhik - A]-j^;xlk] = yJl^Vbaf^lk] + ^y[fc]. (78) 

Equations (74) and (75) assume that - besides speech and noise vectors - also noise vectors at different 
time instants are mutuaUy nnconelated. In practice. (74) and (75) do not hold, especially for laige 

y^O""^' H-* 1. Hence, compared to (65)-(67), paformBnce is eaqrected to be woree. 

In addition, equations (76)-(78) can - in contrast to (65)- not be appUed for m < 1. Compared to (65) no 
significant complexify reduction is achieved. The LMS based updating (76) requires 4J\r£, + 3 Multiply- 
Accumulate (MAC) per sample*", whereas update formula (65) requires (4NL + 5) MAC per sample. The 
conq)utation of the notmalizsed stq) siae in (76) zequires NL + 2 less MAC per sample than m (65). 

4.1.3 Frequency-domiun implementetlon 

As stated before, the stochastic gradient algorithms (65X67) and (76) are expected to suffer fiom a large 
excess error for large ^ and/or highly time-varying noise, due to a large difference between the rank-one 
noise correlation matrices y^y'^[k] measured at different time instants k. The gradient estimate can be 
improved by replacing 

y6»/iytoA[*]-yy^(*] (79) 

in (65) with the time-avn^e 

1 * 1 * • 

^ y*»/iyfcli/iW - -sr X) yy^ra. (so) 

where ^ ELfc-K+i ybufiVbifi H is updated during periods of speech + noise and ^ ]Cb„fc_/f+i yy-^pj 
during periods of noise only. However, this would require expeosive matrik operations. A block-based 
implementation intrinsically performs this averaging: 



w((fc+l)Ji:] = vr[kK\ + 4;: 



K-1 

J2 y + i] iVolkK + i - A] - y^[kK + i]w[kK]) 
" - » Z) (yfr«/i l^-K" + *]y£/i [kK + y[kK + %[y^[kK + 1\) w[kK] 



(81) 



"^oto that the output yo(A - A] - w«y[k] of the algorithm stiU has to be computed. 



^egradient «dh<^ also y^,,y£,^ [fe] -yy^lfcj is avemged over K iterations prior to mate adjustments 
to w. This goes at the expense of a reduced (i.e. by a fector K) convergence rate 

especially for large filter lengths. In addition, in a frequency-domain implementation, eac^ 

^onwlute not <ie^ the tim^omain MSB. Althou^ the ftequeucy and tim^^ impleLentatioa 
obtaur the same MSB. the improvement in SNR^,. which is determmed by the excess e^ors in eal 
^quenpy bm. may be different In a ..e-^.^^ one comml step size p r^^L^^^ 

^eren frequencybms.Theoonvergence rate depends ontheeigenvahae spread of^^ 

^4 htfle power this common step size will be smaller than in the flequency-domain approach. resulL in 

^sZTthTfT : " fi-quency-domam approach, resulting m larger LMS 

ZIIT "^^^f^*^"^"^ ™ - ^^-domain implementation, the power ^ectaX the mput 

2^ not only determmes the convergence rate but also the improvement ASNK^.„,, M .fi.,uenZ 

Tnr^^"^""' ^ " "^'^ ^ - Afferent biL^e 

a smiilar commence rate and hence also excess error. Hence, the SNR improvement in each frequency 
bm IS more controlled (i.e. less de|«ndent on the input power spectnnli). Si^ signal model eZTg 
^crophone mismatch) modify the power spectrum of the nbi^^ 

^dmiprovement ASNR^ of a time-domain hnplementation. frequency-domain implemeSli^^ 
moreappropriatetoevaluatetheperformanceofthealgorithmsfor^^^^ ^ ^ 

of C^^TJ, 2 * fiequem^-domain implementation based on overlap-save 

«n«iiL». ' ""^^^Z '"'"''^esP^^^tively.insteadofstoringthetime-domainvectors, JVFF^ 
operatIons have been saved men adapting during speech + noise, also the 

... ito[&L-A-Hii-l] ]^ (82) 

should then be stored in an additional buffer f= TO ix^^ J • • j ^ . 

- fl>f ,*«,u • J... . T "^er B2.0 6 R 2 durmg periods of noise-only, which -for 

l:.to'^SC^Z'^'' " ^ ~ " Wdomain vectors are 

^•v/Tt " ^" ' '"^ ' m «//;?e^em5. bins. AUenu. 

p co«/^^>e... to 00 atihosejre^encies ^Here the GSC is sufH^^.U, ^r.„. e.u..for sn^l-^oLys L 
con.^ri^SUSl^^^.te^'SLlti^ H^-. in practice o^y Half of fl» 



P'^'P^^cy domain stoc h astic gradient SP -SDW-MWF based on overlap-save. 
W,M = [o ... 0f;i = M-N,...,M-l 

imIOJ = m a 0, .... 2L - 1 

Matrix definitions: 

^"'[oi Oi ° It ]:F = 2ix2I,l»Tinaliix 

For each new block •fj\r£ input samples: 
• ffmlse detected: 

[ MkL - AJ ... »8[*i - A + i - 1] _ noisebttffer B,.o 
2. Y,-lAJ»diaB{p[wIfcr-£] ... t«lfcL + L-l}f}.i = M-JV.....M-l 
Y4*]=diag{[Bx(i.O) ... ^^{i.2L-l)f\,i^M-If,„.,M-1 
cycUcaUy shift each row* of Bi over 2J0 sanities, i 4 Af-iV M-1 
«n&l-[0 ... a |to[fcli_A} .... yolJUi-A+ii-i"]f 



1. F[„4fcL-£] ... V#L + i-llf,i = M-;V....,M-l-.speech4-noi8eba&rB 

2. Y,-lfcl = dia8{[B.«.0) ... B,(i.2£^l)]n,« = M-Ar.....Jl^-l 
cyclicaUy shift eadi rowt of Ba over 2L sanqiles. i Jm -N M-1 
Y*[fcI=diag{F[i,#i-£] ... yi[kL + L-l]f},irM-N.....M-l 
d(fcj = [0 ... 0 B3.o(l,0) ... Ba.o(l,i;-l) r " 

qwhoalty shift B3.0 over isanqiles 

• Updatefonnula: 

ESlW = Fk'ex[fc]; = Fk'ea [fcj; E[fc] = pfcTelfcJ 

2. A{*1 = agldiag .... J^^»_,[fc]} 

= 7l>«(* - U + (1 - 7) fc-T^.. I YJ„|» + i (,v,.„,» _ |V7.„ |») I) 

3. W,Ifc + i] „ w,[fcI + Fgp-^A(*l {YrWB-W - i(Y^(ik) - Y?EIlJfc])}, i= M-iV. 

• O«tput:yo[fc]=[ jtotfcii-AJ ... tt,[ibi- A + i-lj f 

- If noiae detected: yo«,(A!] = yo(fcJ - y.u^itJkJ 

- If speech detected: y^[k] = yo[fcl - y.„,a(fc] 
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Algomam i Iteiuaiey tommM LMS based SP-SDW-M WFlmeil 

Initialization: ~ ~ — ^ 

Matrix definitions: 

^"^[0^ 0^!]*^=[® li ];F=:2i;x2IrDFTii!atrix " 
For each new block of NL Input samples: 

• If noise detected: 

1. F{w(fci-Ll ... yiIfcli+£-l) f .i = M-7»r.....M-l^noisebufferB. 
[Wblftii-A] ... Kb^Jl-A + r-lJ f-^noiwbuffcrBa.o 

2. Y«I*1 =diag {f[ »,[fet-i) w(fcL + i-i] ]'}. « = Ji/_7V, ... JWr-1 
Yi.6„y,(fc]=diag{[ Bx(t.O) ... Bi(i.2L-l) ]''}.4 = M-iV.....Af-l 
XiW = yJl^-Yi[k\ + yiY*^„/JfcJ. i^M-N, .... Af - 1 

cydicaUy shift each row t of buffer Bi over 2L sair^les 

**'*J = 7n^[0 ••• 0 »otfeZ;-A] ... tto[*£-A + L-l J]'' 

• If ^eedt detected: 

1. F [ - £} ... + i _ 1] ] ^ , i = M - AT. .... M - 1 speech+noise buffi^rBi 

2. Y.[fcl=diag {f[ i«(fci-i) ... ,ft(fcL + i-i] f\,i = M-N, ...,M-l 
Y«.»„/,(fcJ = diag{[ Ba(i,0) ... B2(i,22i-1) f }. f = _;^r, .... M-1 
X,[fc] = yT3TY,.».,,(fc] + yiY^lfc], i^M-N, .... Jtf - 1 

qrcUcaUy shift each row » ofhuffet B2 over 2L sample 

= 7itx[0 ... 0 B,.o(1.0) ... Ba.o(l.L-l) ]»- 
cydioally shift B3.0 over £ sauries 

• t/pdatefonmda; 

1. E(fci=Fk'-(d-kP-^i:;i^^_^x,(&]w;[fci) 

2. AW = 3gldiag{p-»(A]. .... I^_,[k]} 
i^»lfcl=7i>™(*-l) + (l-.y)(5^7j_^|X,...,|») - • 

3. W,[fc + Ij = Wi[fc] + FgF-»AlfclX,[fc)B-l*J, i^M-N. .... W - 1 

• Output yo[fc] _ kP-i Efl]^_w Y,[fcJWJtA| 
yo(*l = [ yo[A:i-Al ... yolkL-A + L-l] 



highjrequencies. In that case, only a few frequency components ofY, should be stated in thespeech + noise 
buffer. 
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4.2 Improvement of stochastic gradient algoritlim 

To achieve a ^liable estiniate (80) of the avexage correlation matrix f {y»y^.H} ^i^y time-varying'^ 

rbJ^r^Hi^-r"'''*'"^" ^ '^'l ^ i-ser than LN. Henc«. the ave^g in 

tire WockWd" or ftequenj^-domain implementation proposed in Section 4.1. does not suffice toTtein 

rmproved by applymg a low pass filter to the part of the gradient estimate that takes^ech 
^on mto account. x.e.. the term r[*=] in (65). TTxe low pass filtering avoids a highly time-^^g 

2^ """^ '^'^^ ^ ^« performanS^Lded inlT 

stationaiy noise scenarios. 

4.24 Concept 

Define as^"^ 



= w-w„ (83) 
w. 6 Range{e{y*y-.H}} ^g^^ 

Then, the desired speech component z^[k] at the output equals 

^[fc]-»S[*-A]-W3«y-(4 (86) 
^» ^* ^« "^^ly fa time. This is desired since a fest changing results in a highly time- 
I^tiorT^" '^^"''^""^ la addition, in l^g aid 

room acoustrcs and the average desired speaker position do not change quickly in timi Fast changes in th^ 
norsescenariocanbetrackedbythefilterw„.IluswiUbeiDuslra^ . Fast changes m the 

Tlien. ' ■ 

£{y'y''«}w[&] = £{yV^}v,, 

can be approximated by*^ 

g{y6»/iy&/,-yy^}w, = -^{(ytuAy^A - yy^) w 



»A faroe tr 4. 7 M • ^ ^ '"Sber order statistics are allowed to vary fister in time 

«wC.jj;^K'i^^;:Tt:SaX^T^.^^^^ 

"Just like for the matrix based alporithm, • , ^ direct-to-reveiberant ratio of flie desired speedi is high. 
Uu.itcanbees.i^^iTiS'S^;';:^-'^*^ 



where y is a vector during noise only. Using the independence assumption [40] 

f {y~y^^[fc]w„[fc]} = £{y"y»^[jfe]}£{w„[fc]} (gp) 
and f {y-y.^} = f {yK.,,y»;;}. we find (hat 

H{yimny^f^ - yy^) w.} = £{{ybaf^y^f^ - yy^) w[ife]}. (90) 
Replacing the expectation value tinae averaging, £{y»y-.»}w[fc] can be estimated as 

J <=fc 

K S (y6»/iy£AW-yy"(q)wp] (pd 

during noise only»«. Hie vahie iT detennines the conveigence rate of the filter w,. 

^'^^.ilT'^'' 'V*''"" " ^""t"""'^'' ofe{rr,^}. the long-term averaged noise correlation 
'"'^ ^ E,=fc-if yy^Pl and ^ El=ib-jc y£.AyS?i W shouldnot differ too much from each other This 
<^'«>*'^9«iresihata,esecondorderstatisticsqfthenoise 

It Slices that they are short-term stationary so Omt they can be estimated during noise only periods 

The av«aging operation (91) is perfomiedlyapplymgthefonovnnglowpassfilte = 
H [yi^fiybih - yy ) ^Ifc] in (65): 



r[k] = Ar[A - 1] + (1 - A)i (y5./,y£^, - yy^) w[ik]. 



(92) 



where A < 1. Tins corresponds to an averaging window K of about -i, samples. Tlie normalized step size 
pis modified into 



ra,«lft] = ■^Wfc-lJ + (l-A)i|y^^,y6„y,-y«y|. (94) 
Compared to (65), (92) requires 3NL - 1 additional MAC and extra storage otaNLxl vector r. 
4.2.2 Frequency-domain 

Equation (92) can be extended to the fiequency-domain. The update equation for W< [fc + 1] in algorithm 1 
then becomes: 
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Wi[k + 1] = Wi[fc]+Fgr-iA[A](Yr(fe]E»[fc]-Rj[fc]); 

^[k] = ARc[fc-l]+(l_A)i(Yi(fc]E|{ft]-Ynfe]El[fcl) 



(95) 



with 



(M-l \ 



(96) 

Af-1 

Ei[fc] = Pk^kp-l 5^ Y7Ifc]W5[fe]; (97) 



and computed as follows: 

p[k] = ^diag {Po-^[fc], p-ijfc]} 



Af-1 

|2 



Pl.m[A] = 7i>l.m[fc-l] + (l-7) £ l^J^nvl 

i=Af-JV 



Af-1 



J=M-N 



Compared to algorithm 1, (9SH98) requires one extra 2L-paint FFT and 8NL -2N-2L extra MAC per 
L samples and additional memory storage of a 2i^ri x 1 real data vector To obtain the same time cooslant 
in the averagmg opaation as in the time-domain version with iif = 1, A should equal A^. 

Experimental results in Section 4.3 wiU show that flie performance of the stochastic gradient algorithm 
significantly mqjroves by the low pass filttar, especiaUy foi large A. 

4.2.3 Complexity of different stochastic gradient algorithms 

Table 1 summarizes the computational complexity (expressed as the number of real multiply-accumulate" 
(MAC), divisions (D), square roote (Sq) and absolute values (Abs)) of the time-domain (TD) and frequency- 
domain (FD) Stochastic Gradient (SG) and NLMS based algorithms. Comparison is made with standard 
NLMS and the NIAIS based SPA. We assume that one complex multiplication is equivalent to 4 real mul- 
tiplications and 2 real additions. A 2.L-po int FFT of a real input vector requires 2Llog2 2L real MAC 
"ocmtedas theiMimberof mtiltiply-accumnlate. addittons aadmd^^ 



(assuming iadix-2 FFT algoriUuns). 

Ikble 1 indicates that the TD-SG vnthout filter wo and the SPA ate about twice as complex as the 
standard ANC. When applying a Low Pass filter (LP) to the regularization tenn. the TD^G algorithm 
has about three times the complexity of the ANC. The mcrease in complexity of the frequency-domain 
ixnpiementatioiis is less. 

Table 1: Computational complerity of TD and FD-NLMS and stodiastic gradient algorithms (expressed as 
number of real MAC, divisions (D), absofate values (Abs) and square roo^Sq) per ^ 





Algorithm 


update formula 


adaptation of step size 


TD 


NLMSANC 
NLMS based SPA 




(4(JWf 


(2M-2)Z:,+ 1)MAC 
- 1)L + 1) MAC+1 D+1 Sq 


lI>f(Af -~l)i:,MAC 

irH-(jwr~i)iMAc 




SG 

NLMS based algorithm 
SG with LP 


(4iV/;-f-5)MAC 
(4iVLH-3)MAC 


1 D + 1 Abgf (2J^^X -f 2) MAC 
1 m-l zb^{2NL -f 4) MAC 




NLMSANC 
NLMS based SPA 


(lOM 
(14Af 


-7- 
-11- 


) + {6M - 2) lo& MAC 
. + (6M - 2) loga 2L MAC 
+l/iSq+l/i;D 


irH-{2M + 2)MAC 
lD4.(2Af + 2)MAC 




SG 


{ISN 4- 6 


- ¥^ ) + (6JV + 8) loga 2L MAC 


1 D+ 1 abs + (4 JV + 4) MAC 




NLMS based algorithm 


(16JV + 4 


- ^) + (6i\r + 6) loga 2i:MAC 


ll>l'(2/Nr + 2)MAC 




SG with LP 


(26JV + 4- 


■ ^) + (6iSr + 10) log^ 2iMAC 


1 IH-1 Abs4-(4JVr + 6) MAC 



Remark In Table I andFlgure 8, the complexity of time-domain andjrequency-domain NLMS ANC and 
NLMS based SPA r^resents the complexity M^hen the adaptive filter is only x^dated during noise only If 
*head^efiherisabo^datedduringspeec^^noiseusingdatajwmanoisebuff^^ the time-domain im- 
plementations require NL additional MAC per senile and ihe frequency-domain implementations require 
2 additional FFT and (4L(M - 1) - 2(M -1) + L) MAC per L samples. 

Asanillustration.Figure8plotsthecomplexi1y(expressedasthenumberofMegaoperationspersecond 
(Mops)) of the tmie-domain and fiequency-domain stochastic gradient algorithm with LP filtering as a 
function of i for M = 3 and a sampling fiequency /, = 16 kHz. Comparison is made with the NLMS- 
based ANC of the GSC and the SPA. The complexity of the FD SPA is not depicted, smce for smaU M 
it IS comparable to the cost of the FD-NLMS ANC. For L > 8, the fiequency-domain implementations' 
result m a significantly lower complexity compared to their time-domain equivalents. The computational 
cost of the FD stochastic gradient algorithm with LP is limited, making it a good altemative to Ihe 'SPA for 
implementation in hearing aids. 

4^ Experimental results 

In this Section, we evaluate the performance of the different FD stochastic gradient algorithms based on 
expenmental results for a hearing aid application. Comparison is made with the FD-NUVIS based SPA. For 
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a feir comparison, the FD-NLMS based SPA is -lite the stochastic gradient algoriflims- also adapted during 
speech + noise using data fiom a noise buffer. 

4.3.1 Set-up 

A three-microphone Behind-The-Ear (BTE) hearing aid with three omnidirectional microphones (Knowles 
FG-3452) has been mounted on a dummy head in an office room. The interspacing d between the first and 
the second microphone is about d = 1 cm and the interspacing between the second and third microphone 
about 1.5 cm. The reverberation timeTaods is about 700 ms for a speech weighted noise. The desired speech 
signal and thenoise signals aieunconelated. tlie desired speech source consists of sentences spoken by a 
male speaker. Bofli the speech and the noise signal have a level ofTOdBSPL at the center of the head. TTie 
desired speech source and noise sources are positioned at a distance of 1 meter from the head: the speech 
source in front of the head, the noise sources at an angle 6 wxt the speech source. For evahiation purposes. 
<he speech and noise signal hawe been recorded separately. 

The microphone signab are pre-wWtened prior to processing to hupravo intelligibiHty [37], and the 
output IS accordingly de-whitened. In the experiments, flie microphones have been caUbrated by means of 
lecordmgs of an anechoic speech weighted noise signal positioned at 0» measured while the microphone 
array was mounted on the head. A delay-and-sum beamfomier is used as a fixed beamformer. since -in case 
of smaU microphone inlerspadng - it is robust to model errors. The blocking matrix B pairwise subtracts 
the time aligned calibrated microphone signals. 

The performance of &e FD stochastic gradient algorithms is evaluated for a filter length i = 32 taps per 
channel. ^ = 0.8 and 7 = 0. To exclude the effect of the spatial pre-processor. the performance measures 
are calculated wxt the output of the fixed beamformer. The sensitivity of the algorithms against errors in 
the assumed signal model is iUusti^ fyc microphone mismatch. e.g.. a gam mismatch Tj = 4 dB of the 
second microphone. Among the different possible signal model errors, especially microphone mismatch 
was found to be harmiul to the performance of the GSC in a hearing aid i^lication [17]. In hearing aids 
microphones are rarely matched in gam and phase. In [3], gain and phase differences between microphon^ 
characteristics of up to 6 dB and 10°, respectively, have been rq)orted. 

43^ Comparison of different FDslochastic gradient techniqaes 

Figure 9(a) and (b) compare die performance of the different FD Stochastic Gradient (SG) SP-SDW-MWF 
algorithms without wo (i.e.. the SDR-GSC) as a fimction of the trade^jff parameter m for a stationary and 
non-stat,onaiy (e.g., multi-talker babble) noise source, respectively, at 90°. To analyze the impact of the 
approximation (64) on the performance, the result of a FD nnplementation of (63). which uses the clean 
speech, is depicted too. For both noise scenarios, the stochastic gradient algorithm significantly outperforms 
the NLMS based algorithm, especially for i L Without Low Pass (LP) filter, both algoriduns achieve 
a worse miprovement compared to (63). especially for hage /t. For a stationary speech-like noise source 
the FD-SG algorithm does not suffer too much from approxhnation (64). In a highly time-varying noise 



sc«ano. such as muIti-talkcr babble, the limited averaging of r[fc] in the FD implementation does not 
suffice to mamtanx the laige noise reduction achieved by (63). The loss in noise reduction performance 
cou^d be reduced by decreasmg the step-size p'. at the e:q,ense of areduced convergence speed. Applying 
the low pass filter (95) significandy improves the performance for all 1 while changes in the noise s^ 
can Still be tracked. ^"*«*w 

Figure 10 plots the improvement ASNRj^, and SD^ of the SP-SDW-MWF (i = 0.5) widi and 

Mt' . ^^^^ ^^'y ^ i-creasmg A. For small A. the S^- 

SP-SDW-MWF without wo- This is due to the larger dimensions of £r{y^y«.*} 

^^fflt««voidsduaihedesiredspeediisdistortedbyamgWytime.varyingmterw,.fc 
to a decrease m step size p'. die filter does not compromise traddng of dianges in the noise scenario 
As an ill^trabo^ Figure 11 plots the convergence behavior of the FD stochastic gradient algorithm withoui 
wo (,.e.. die SDR-GSQ for A = 0 and A = 0.9998. respectively, when die noise source position suddenly 
d.^fiom90o to 180o.AgainmismatchT.of4dB was applied todiesecondmicr^^^^^ 
fluctuations mthe residualnoise energy 4 andspeech dist^^^ 

source m Unsexpemnent are stationary, speech-like. The upper figure depicts the residual noise energy 
as a fonction of die number of input samples, the lower figure plots die residual speech distortion ^ during 
speech + noiseperiodsasafunctionofdienumberofspeech + noisesamples. Bofli algoriduns (i e aTq 
""1 r "n^S*^""^ ^""'^ comrergence xate. When the change in position occurs, die algorithm 
wifli A = 0.9998 even converges fester. For A = 0. die approximation error (64) remains large for a while 
smcedienoise vectorsmfl^buflfer aienotnptodate. For A = 0.9998. fl« hnpact of die instantaneous large 
^jproximation error is reduced diankB to die low pass filter. 

433 Comparison with SPA 

ffZ L^*^ " ""^^ performance of die FD stochastic gradient algoridun widi LP filter 

^B^^r^^^''^'^''"^ SPAinamnltiple noise source scenario. IHe noise scenario consists 
of 5 multi-telker babble no^e sources positioned at angles 75», 120", 180°, 240°, 285° wxt. die desired 
source at 0 To assess die sensitivity of die algoridims against «rors m die iissumed signal- niodel. -flife- 
mfluence of microphone mismatch, e.g.. gain mismatch X. = 4 dB of die second microphone, on die 
^^r^ ""i'^'T. ? improvement ASNR.^, and die distortion SD,^, of die 

SP-SDW-MWF widi and widiout filter wo is depicted as a fimction of die trade off fector A FiLe 13 
shows the results of die QIC-GSC 

for different constramt values 0^, which is implemented udng die FD-NLMS based SPA 

Bodi. die SPA and die stochastic gradient based SP-SDW-MWF increase die robusbiess of die GSC 
(x.e.. die SP-SDW-MWF widiout Wo and A = 0). For a given maximmn allowable distortion SD^. 
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the SP-SDW-MWF with and without wo achieve a better noise reduction peifonnance than the SPA. The 
perfonnance of the SP^DW-MWF with wq is -in contiast to the SP-SDW-MWF without wq- not a^cted 
by microphone mismatch. In the absence of model enors, the SP-SDW-MWF with wq achieves a slightly 
worse performance than the SP-SDW-MWF without wq. With wo, the estimate of if {y-y*^} is less 
accurate due to the larger dimensions of i5{y»y»'^} (see also Figure 10). 

In short, file proposed stochastic gra^ent implementation of the SP-SDW-MWF preserves the benefit of 
flie SP-SDW-MWF over Oie QIOGSC. 

4.4 Conclusions 

In fliis paper, we derived time-domain and fiequemqr-domain stochastic gradient algorithms for the SP- 
SDW-MWF and compared their performance to the SPA. Starting from the cost function of the SP-SDW- 
MWF. a time-domain stochastic gradient algorithm has been derived in Section 4.1. In addition, the LMS 
based algorithm [26] has been extended so that it appKes to the SP-SDW-MWF. To mcrease convergence 
and reduce complexity, a fiequency-domain m^lementation has been proposed. Both, the stochastic gra- 
dient and LMS based algorithm suffer from a large excess enor when appUed in highly time-varymg noise 
scenarios, hi Section 4.2. we show that the excess error is reduced by applying a low pass filter to the part of 
the gradient estunate that limits speech distortion. The low pass filtering avoids a highly time-varying distor- 
tion of the desired speech component while not degrading tiie tracking performance needed m time-varying 
noise scenarios. Section 4.3 compares the performance of the different fiequency-domam stochastic gradi- 
ent algorithms for a hearing aid appUcation. The stodiastic gradient SP-SDW-MWF outperforms the LMS 
based algorithm, while complexity is not increased. For a non-stationary noise scenario, the LMS based and 
stochastic gradient SP-SDW-MWF suffer from a reasonably large excess error. Experimental results show 
that die low pass filtering significantly hnproves the performance of the stochastic gradient algorithm and 
does not compromise the tracking of changes m the noise scenario. In addition, experhnents demonstrate 
that the proposed stochastic gradient algorithm preserves benefit of the SP-SDW-MWF ovw QIC-GSC. 
The hmited computational cost and the better noise reduction performance of the proposed algorithm make 
it a good alternative to the SPA for implementation in hearing aids. 
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A Efficient frequency-domaiii implementatioia using correlation matrices 

In Section 4 stochastic gradient algorithms in the time^iomain and in the frequency-domain have been 
developed for implementing flie Speech Distortion Weighted Multichannel Wiener FUter (SDW-MWF). 
These algorithms however require large data buffers for calculating tiie regularisation tenn required in tiie 
filter update formulas, resulting in a large memory usage. In tiiis addendum it is shown that by approximating 
tiiis regularisation term in the ftequency-domam, (diagonal) speech and noise correlation matrices need 
to be stored instead of data buffers, such tiiat the memory usage is decreased drastically, while also the 
computational complexity is fimher reduced. Experimental results dranonstxate tiiat tiiis appitoximation 
results in a smaU - positive or negative - perfonnance difEerence, such that tiie proposed algorithm preserves 
the robustness benefit of the SP-SDW-MWF over the QIC-GSC, whUe both its computational complexity 
and memory usage are now comparable to tiie NLMS-based SPA for unplementing tiie QIC-GSC. 

A.1 Stochastic Gradient a^orithms 

In this section we first briefly review tiie stochastic gradient algorithm m tiie time-domain (cf. Section 
4.1.1) and tiie calculation of tiie regularisation term r[fc] (c£ Sections 4.1.1 and 4.2.1). We tiien show 
that by approxunating flnis regularisation term in tiie fi^quency-domain tiie memory usage can be reduced 
drastically.. 

A.1.1 Time-Domain implementation 

In Section 4.1. 1 a stochastic gradient algoriflun in tiie time-domain has been developed for mmimismg tiie 
cost function in (42), i.e. 

vr[k + l] = w[fel+p{y«[fc](j,J.*[&-.A]-y«-^Ifc]w[fc])-r[fc]} . (109) 

'[k] = ^y'[k]y""[kMk] (110) 

y«^[*,Jyn(fc] + iy-^[fc]y»[fc] + J ' (HI) 

with p file normalised step size of tiie adaptive algorithm, 5 a smaU positive constant, and w[A!l, y^[k], y» [k] 
and r[k] iVi-dimensional vectors. For 1/n = 0 and no filter wo present, (109) reduces to an NLl^-type 
update formula often used in GSC, and operated during noise-onfy-periods [7, 10, 13, 42]. For l/fi 9^ 0, 
tiie additional regularisation term r[A;] Umits speech distortion due to possible signal model errors. 

In order to compute (110), knowledge about die (instantaneous) correlation matrix y»[fcjy«'^[fe] of tiie 
clean speech signal is required, which is obviously not available. In order to avoid the need for calibration, 
it is suggested in Section 4.1.1 to store L-dunensional speedHiiois&-vectors yi[k], i = M -N ... M -\, 
during speech-periods inackcuhurapeecft+noiye-fo^erBi € R^'^W andtoadapttiiefilterw[fc] using 
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(109) during noise-only-periods^^ based on approximating the regularisation term in (110) by 

T[k] = i (y^/Jfcly£/Jfc] -y^lfcly'^^^lfc]) w(fc] , (112) 

with ytu/i [k] a vector from the circular speedi-fnoise-bufiFer Bi, cf. (72). However, as has been indicated 
in Section 4.1.1, this estimate of r[fc] is quite poor, resulting in a large excess error, especially for small /m 
and large pf . Hence, it has been suggested to use an estimate of the average clean speech correlation matrix 
^{y''lfcly*'^[A;]} in (110), such that r[A;] can be computed as 

1 ^ 

r[k] = -^(1 - A) X: A^-' (y6u/i[«/xW - y"[ily"'^W) • wife] , (in) 

wifli A an exponential weighting factor and the step size p m (1 1 1) now equal to 



y-.'^[fc]y«[fcl + i(i - A) eLo a'^-' \ygf,[i]yi^fS] - y^^[i]yni]\ + * ' 

For stationcay noise a small A, i.e. 1/(1 - A) ^ NL, suffices. However, in practice tiie speech and the 
noise signals are often spectrally highly non-stationary (e.g. multi-talker babble noise), whereas their long- 
term spectral and spatial characteristics usually vary more slowly in time. Spectrally highly non-stationary 
noise can still be spatially suppressed by usmg an estimate of the long-term correlation matrix in r[fe], i.e. 
1/(1 - A) » NL. 

In order to avoid expensive matrix operations for computing (113), it is assumed m Section 4.2.1 that 
w[fc| varies slowly m time, i.e. w[fc] « w[q, such that (113) can be approxunated with vector instead of 
matrix operations by dkectly applying a low-pass filter to the regularisation term r[fcl, c£ (100), 

Ak] ^ hi->)Y,x^-''{yi^fAi^^^ (114) 
= Mk-i\ + {i-X)\{yin,sx[k]ygfy\^^ . (115) 

However, as will be shown in the next paragraph, this assumption is actually not required in a frequency- 
domain implementation. 

A.1,2 Efficient Frequency-Domain Implementation 

In Section 4.2.2 the (improved) stochastic gradient algorithm m the time-domain has been converted to 
a frequency-domain implementation by using a block-formulation and overlap-save procedures (similar 

^In Section 4.1.1 it has been shown that storing noise-only- vectois yi[A;l = y?[A:], i = 0 . . . M - 1 during noise-only-periods 
in a circular noise-bufferBz € R'^^^fcu/a additionaUy allows adaptation during speech-periods. 
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to standard adaptive filtering techniques in the firequeiicy-domain [43]). However, the frequency-domain 
algorithm described in Section 4.2.2 (Algorithm 3) requires many data buffers and hence the storage of a 
large amount of data«. A substantial memory (and computational complexity) reduction can be achieved 
by tiie following two steps: 

• Whenusing (1 13) instead of (1 15) for calculating the regularisation term, correlation matrices instead 
of data samples need to be stored. The firequency-domain implementation of the resulting algorithm 
is then summarised in Algorithm 4, where 2L x 2L.dimensional speech and noise correlation ma- 
trices Sy [fc] and S?.[A:], i, ; = M - iV . . . M - 1 are used for calculating die regularisation term 
Rilfc] and (part of) the step size A[A;]. These correlation matrices are updated respectively during 
speech-periods and noise-only-periods^^. However, this first step does not necessarily reduce the 
memory usage (iV L^f^ for data buffers vs. 2{NLf for conelation matrices) and wUl even increase 

• the con^utational complexity, since the correlation matrices are not diagonal. 

• The conrelation matrices m the fi»quency-domain can be approximated by diagonal matrices, since 
PlcTkp-i in Algorithm 4 can be well approxunated by l2i;/2 [44, 45]. Hence, the speech and the 
noise correlation matrices are updated as 

Sy[fe] = ASii[fc-l] + (l-A)Yf[fe]Yy(fc]/2. (116) 
S^-lfcJ = AS?^lA:-l] + (l-.A)Y^^[fe]Y7[fc]/2. (117) 

leading to a significant reduction in memory usage and conqiutational complexity, cf. Section A.2, 
while having a minimal impact on the performance and the robustness, cf. Section AJ. We will refer 
to this algorithm as Algorithm 5. Algorithm 5 is in feet quite snnilar to the algorithm presented in 
[46], which is derived directly from a frequency-domain cost fimction. Some majof differences how- 
evCT exist, e.g. in [46] the regularisation term is absent, tfie term FgF"^ is also approximated 
by l2i;/2 and flie speedi and the noise correlation matrices are blodc-diagonaL 

A.2 Memory usage and computational complexity 

Table 2 summarises flie computational complexity and tiie memory usage of Ae frequency-domain NLMS- 
based SPA for implementing the QIC-GSC [14]23 and the freqWfeiicy-ddtflaiff-stO-chastic-gradient algorithms - 
for implementing the SDW-MWF (Algorithm 3 and Algorithm 5). As in Section 4.2.3, die computational 
complexity is expressed as the number of operations per second (MIPS), while the memory usage is ex- 
pressed ni kWords. We assume that one com plex multiplication is equivalent to 4 real multiplications and 2 

» * performance, typical values for the buffer lengths L,^,^ and L^,^ of the circular buffers Bi and 

Ba are 10000 ... 20000. 

conelation matrices, filter adaptation can only take place during noise^nly^jeriods. since daring speech-periods 
thedestred signal dlfcj cannot be constructed firom the noise-buffer Ba any more. 

The computational complexity of fee frequency-domain QIC-OSC using SPA also t^resents the complexily vi*en the adaptive 
filter IS only updated during noise-only-periods. 
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Algorithm 4 Fr equency domain implementation with correlation matrices (without approximation) 

Initialisation and matrix definitions: 

Wi(0]=[O ... 0f,i^M-N...M-l 

Pfa[0] = ^m, m = 0 . . . 2L - 1 

F = 2i X 2IrHlimensionaI DFT matrix 

Ox = X X Irdimensional matrix wifli zeros, U^Lx 2:-dimensional identity matrix 
For each new block of L samples (per channel): 
d[k] = [yo[kL-A] ... yo[feL-A + i-l] ]^ 

Yilfcl = diag{F[i«[fcL^ii] ... yilkL + L-^l]f},i^M-N...M-l 
Ou^t signal: 

e[feJ = d[fe]-.kF"^ Yj[*]Wi[fc], E[fc]=Fk^e[fc] 

If speech detected: 

k 

Sy (fc] = (1 • A) A*-*Yf [ZlFk^kP-^Y,.(q = AS^,.(fc - 1] + (1 - A)Yf [fc]Fk^kp-i Y,.[Jfe] 
I/noise detected: Yi[k] = Yp(ftJ 

SSlft] = (1 - A) X: A^-'Y^-^plFk^kF-^Y^M = AS5[fc - 1] + (1 - X)Y7'^[k]Fk^l^-^Ym] 
Updateformula (only during noise-only-periods): 

W4fc + 1] = W»(&] + FgF-^A[&J {Y^'^lfclElfcj.-.Rilfc]}, i = M - AT M- l 

with " 

A[k] = ^ diag {Po-i[fc], .... P2li.Jfc]} 

Pm[fc] = -yPr^lk - 1] + (1 _ ^) (Pi_„(A:] + J^«[fci), m = 0 . . . 2i - 1 

, m = 0 ... 21,- 1 
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Algorithm 


Computational complexity 


MIPS 






adaptatiou of step si^e 




NLMS based SPA 
(QIC-GSC) 


(14W - n - iis^) + (6M - 2) log, 22;.MAC" ■ ■ 
+l/I,Sq+l/JiD 


(2M + 2)MAC 
+1D 




SG with LP 
(Algorithms) 


(26Ar + 4 - + (6W + 10) log, 2I,MAC 


(4JV + 6)MAC 
+lD + lAbs 




S>u With LP 
(Algorithms) 


(lOJV'' + 13JV- 4i£^) + (6iV + 4)loga22,MAC 


(2iSr + 4)MAC 
+lD + lAbs 


2.71W 4,31*> 




Memory usage 




kWords 


QIC-GSC 


4(M-1)L + 6L 


0.45 


Algorithm 3 




40,61<''>,60.80g*> 


Algorithm 5 







']j = M-^r^)?frM " 16kH2.L6„^ = 10000. 



real additions and that a 2£.pomt FFT of a real input vector requires 2L log^ 1L real MACs (assuming the 
radix-2 FFT algorithm). From this table we can draw the foUowing conclusions: 

• The computational complexity of the SDW-MWF (Algoriflim 3) with filter wo is about twice the 
complexity of the QIC-GSC (and even less if the filter wo is not present). Tte approximation of the 
regulaiisaticn term in Algorithm 5 further reduces the computational complexity. However, this only 
remams true for a small number of input channels, smce the approximation introduces a quadratic 
teimO(iV2). 

• Due to the storage of the data sa]iq>les used m the circular speech+noise-buffer Bi , the memory vsage 
of ihe SDW-MWF (Algorithm 3) is quite high in comparison with the QIC-GSC (depending on the 
size of the data buffiar Lfa,/, of course). By using the approximation of the regolarisation term m 
Algorithm 5, the memoiy usage can be reduced drastically, since now diagonal conelation matrices 
instead of data buffers need to be stored. Note however that also for the memory usage a quadratic 
term 0{Jtf^) is presaot. 



A.3 Experimental results • • - . .- _ . . .. 

In this paragraph it is shown that practically no performance diflference exists between Algorithm 3 and Al- 
gorithm 5. such that the SDW-MWF using the implementation proposed in this addendum indeed preserves 
its robustness benefit over the GSC (and the QIC-GSQ. 
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Set-up 

Thesame set-up has beenusedasmSection4.3.1.A3-imcrophoneBTE hearing aid wifl^ 
microphones (Knowles FG-3452) has been mounted on a dummy head in an office room. The interspacing d 
between the first and the second microphone is about d = 1 cm and the interspacmg between the second and 
the thud microphone is about 1.5cm. The reverberation time Teoas is about 700 ms for a speech weighted 
noise. The desired speech source and the noise sources are positioned at a distance of 1 m from the head 
The desired speedi source is positioned in fiont of the head (at O') and consists of EngUsh sentences The 
noise scenario consists of five multi-talker babble noise sources, positioned at 7&>, 120*. 180°, 240° and 
285«. The deshed signal and the total noise signal both have a level of 70 dB SPL at the c^tre of the head. 
For evaluation purposes, the speech and the noise signals have been recorded separately. 

The microphone signals are pre-whitened prior to processmg to unprove intelUgibiUty [38], and the 
output IS accordingly de-whitened. In the experiments, the microphones have been caUbrated by means 
of recordmgs of an anechoic speech weighted noise signal positioned at 0° measured while the BTE was 
mounted on the head. A delay-and-sum beamformer is used as the fixed beamformer and the blockmg 
matrix pattwise subtracts the time-aUgned calibrated microphone signals. 

The performance of the stochastic gradient algorithms m the firequency-domam is evaluated for afilter 
length L = 32 per diamiel. f/ = 0,8. 7 = 0.95 and A = 0.9998. For aU considered algorithms, filter 
adaptation onlytakes place during nowe^«fyj,erw«fe. TbexdudeflieeflFectofthe spatial pre-processor. the 
performance measures are calculated with respect to the output of the fixed beamfinmei; The sensitivity of 
the algorithms agamst errors in the assumed signal model is ittustrated for microphone mismatch, i.e. a gam 
mismatch T2 = 4 dB at the second microph<me. 

A3.2 Experimental results 

Figures 14 and 15 depict the SNR improvement ASNRtateiiig and the speech distortion SDi„:,nig of the SP- 
SDW-MWF (with wo) and the SDROSC (without wq), unplemented usmg Algorithm 3 (soUd Ime) and 
Algontfam 5 (dashed line), as a fimction of the trade-off parameter 1//*. These figures also depict the effect 
of a gain mismatch Ta = 4dB at the second microphone. From these figures it can be observed that 
approximating the regularisation term only results m a smaU performance difference. For most scenarios 
the performance is even better (i.e. larger SNR unprovement and smaller speech distortion) for Algorithm 
5 than for Algorithm 3. probably since m Algorithm 3 the additional assumption is used that the filter wffcl 
varies slowly in time. of. (1 15). 

Hence, also when unplementing the SDW-MWF usmg the proposed Algorithm 5. it still preserves its 
robustness benefit over the GSC (and the QIC-GSC). E.g. it can be observed that the GSC (i.e. SDR-GSC 
with 1/m = 0) wiU result m a large speech distortion (and a smaller SNR unprovement) when microphone 
mismatch occurs. Both the SDR-GSC and the SP-SDW-MWF add robustness to the GSC, i.e. the dis- 
tortion decreases for mcreasmg l//z. The performance of the SP-SDW-MWF is agam hardly affected by 
microphone mismatcL 
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A.4 Conclusion 



In flus addendum we have shown that the memory usage (and the computational complexity) of the SDW- 
MWF can be reduced drastically by approximating flie regularisation term in the frequency-domain, ie. by 
computmg the regularisation term using (diagonal) frequency-domain correlation matrices instead of time- 
domam data buffers. It has been shown that approximating tixe regularisation term only results in a small 
perfomiance difference, such fliat the robustness benefit of the SDW-MWF is preserved. whUe now both the 
computational complexity and fte memory usage are comparableto 

theQIC-GSC. e 
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Figure 14: SNR improvement of frequency-domain SP-SDW-MWF (Aigoritbm 3 and Algorithm 5) 
multiple noise source scenario 
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Figure 15: Speech distortion of frequency-domain SP-SDW-MWF (Algorithm 3 and Algorithm 5) in a 
multiple noise source scenario 
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Figuie 1: Ccmcept of the Generalized Sidelobe CancellCT. 
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Figuie 2: Equivalent approach of multi-channel Wiener ffltering. 
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Figure 3: Spatially Pre-processed SDW MWF. 
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Figure 4: Decomposition of SP^DW-MWF with wo in a multi-channel filter w,, and single-channel post- 
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Figure 7: ASNRi„teiiigand SDintdUg for QIC-QSC as a function of0^ for different gain mismatciiAg Xo at the 
second microphone, 
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Figure 8: Complexity (expressed in Mops) of TD and FD Stochastic Gradient (SG) algorithm with LP 

Sr^^xT^ f JL""^ filter length Lper channel; M = 3. For comparison, the complexity of the standard 
NLMS ANC and SPA are depicted too. 
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Figure 9: Performance of different FD Stochastic Gradient (FD-SG) algoritbms; (a) Stationary speech-like 
noise at 90°. ; (b) Multi-talker babble noise at 90''. 
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Figure 10: Influence of LP filter on performance of FD stochastic gradient SP-SDW-MWF = 0. 
without Wo and with wq. Babble noise at 90**. ^ 
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Figure 11: Conveigence behavior of FD-SG for A = 0 and A = 0,9998. Tie noise source position suddenly 
changes &om 90** to 180® and vice versa. 
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Figure 12: Performance of FD stochastic gradientimplementation of SP-SDW-MWF withLP (A = 0.9998) 
in a multiple noise source scenario. 
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Figure 13: Peiforniance of FD SPA in a multiple noise somce scenario. 
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